[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2299#discussion_r187843922 --- Diff: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java --- @@ -401,6 +401,11 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, } else if (actualDataType == DataTypes.LONG) { return ByteUtil.toBytes((Long) dimensionValue); } else if (actualDataType == DataTypes.TIMESTAMP) { + if (dimensionValue instanceof String) { --- End diff -- Avro supports timestamp as logical type. So need not support this, as avro supports specific data type ---
[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2299#discussion_r187842996 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { +// For NonTransactional table, compare the schema of all index files with inferred schema. +// If there is a mismatch throw exception. As all files must be of same schema. +validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { +SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); +MapindexFiles = segment.getCommittedIndexFile(); +for (Map.Entry indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { +LOG.error("Schema of " + indexFile.getName() ++ " doesn't match with the table's schema"); +throw new IOException("All the files doesn't have same schema. " ++ "Unsupported operation on nonTransactional table. Check logs."); + } +} + } + + private boolean compareColumnSchemaList(List indexFileColumnList, --- End diff -- 1) Add validation logic the place where read is happening. otherwise files will be read 2 two times. 2) All the blocklet index loading distribution logics also will work fine if handled while reading. ---
[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2299#discussion_r187768276 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { +// For NonTransactional table, compare the schema of all index files with inferred schema. +// If there is a mismatch throw exception. As all files must be of same schema. +validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { +SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); +MapindexFiles = segment.getCommittedIndexFile(); +for (Map.Entry indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); --- End diff -- It seems it's unnecessary to new a path object. ---
[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2299#discussion_r187768333 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { +// For NonTransactional table, compare the schema of all index files with inferred schema. +// If there is a mismatch throw exception. As all files must be of same schema. +validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { +SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); +MapindexFiles = segment.getCommittedIndexFile(); +for (Map.Entry indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { +LOG.error("Schema of " + indexFile.getName() ++ " doesn't match with the table's schema"); +throw new IOException("All the files doesn't have same schema. " ++ "Unsupported operation on nonTransactional table. Check logs."); + } +} + } + + private boolean compareColumnSchemaList(List indexFileColumnList, --- End diff -- please optimize the method name, such as `isColumnSchemaSame` ---
[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2299#discussion_r187768121 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table, dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher); tableIndices.add(dataMap); -allDataMaps.put(tableUniqueName, tableIndices); +allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices); --- End diff -- The name is already lowercased here, no need to convert it here. If it is not, there maybe other bugs that cause it. ---
[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2299#discussion_r187768312 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { Set tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + CarbonTable carbonTable = this.getCarbonTable(); + if (!carbonTable.getTableInfo().isTransactionalTable()) { +// For NonTransactional table, compare the schema of all index files with inferred schema. +// If there is a mismatch throw exception. As all files must be of same schema. +validateSchemaForNewTranscationalTableFiles(segment, carbonTable); + } tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); segmentMap.put(segment.getSegmentNo(), tableBlockIndexUniqueIdentifiers); } return tableBlockIndexUniqueIdentifiers; } + private void validateSchemaForNewTranscationalTableFiles(Segment segment, CarbonTable carbonTable) + throws IOException { +SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); +MapindexFiles = segment.getCommittedIndexFile(); +for (Map.Entry indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { +LOG.error("Schema of " + indexFile.getName() ++ " doesn't match with the table's schema"); +throw new IOException("All the files doesn't have same schema. " ++ "Unsupported operation on nonTransactional table. Check logs."); + } +} + } + + private boolean compareColumnSchemaList(List indexFileColumnList, + List tableColumnList) { +if (indexFileColumnList.size() != tableColumnList.size()) { + return false; --- End diff -- Can you add a log here and line170 to tell the reason. ---
[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2299#discussion_r187762940 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java --- @@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table, dataMapSchema, dataMapFactory, blockletDetailsFetcher, segmentPropertiesFetcher); tableIndices.add(dataMap); -allDataMaps.put(tableUniqueName, tableIndices); +allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices); --- End diff -- Is this a bug? It seems not related to this PR ---
[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2299 [CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2299.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2299 commit 0846033a4ea91bf6e97a8a32dcbf853c2d98fdb0 Author: ajantha-bhatDate: 2018-05-11T08:58:46Z [CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue induced by PR #2273 ---