[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-14 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187843922
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -401,6 +401,11 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 } else if (actualDataType == DataTypes.LONG) {
   return ByteUtil.toBytes((Long) dimensionValue);
 } else if (actualDataType == DataTypes.TIMESTAMP) {
+  if (dimensionValue instanceof String) {
--- End diff --

Avro supports timestamp as logical type. So need not support this, as avro 
supports specific data type


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-14 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187842996
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 Set tableBlockIndexUniqueIdentifiers =
 segmentMap.get(segment.getSegmentNo());
 if (tableBlockIndexUniqueIdentifiers == null) {
+  CarbonTable carbonTable = this.getCarbonTable();
+  if (!carbonTable.getTableInfo().isTransactionalTable()) {
+// For NonTransactional table, compare the schema of all index 
files with inferred schema.
+// If there is a mismatch throw exception. As all files must be of 
same schema.
+validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+  }
   tableBlockIndexUniqueIdentifiers =
   BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
   segmentMap.put(segment.getSegmentNo(), 
tableBlockIndexUniqueIdentifiers);
 }
 return tableBlockIndexUniqueIdentifiers;
   }
 
+  private void validateSchemaForNewTranscationalTableFiles(Segment 
segment, CarbonTable carbonTable)
+  throws IOException {
+SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
+Map indexFiles = segment.getCommittedIndexFile();
+for (Map.Entry indexFileEntry : indexFiles.entrySet()) 
{
+  Path indexFile = new Path(indexFileEntry.getKey());
+  org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
+  indexFile.toString(), carbonTable.getTableName());
+  TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
+  tableInfo, identifier.getDatabaseName(),
+  identifier.getTableName(),
+  identifier.getTablePath());
+  List indexFileColumnList =
+  wrapperTableInfo.getFactTable().getListOfColumns();
+  List tableColumnList =
+  carbonTable.getTableInfo().getFactTable().getListOfColumns();
+  if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+LOG.error("Schema of " + indexFile.getName()
++ " doesn't match with the table's schema");
+throw new IOException("All the files doesn't have same schema. "
++ "Unsupported operation on nonTransactional table. Check 
logs.");
+  }
+}
+  }
+
+  private boolean compareColumnSchemaList(List 
indexFileColumnList,
--- End diff --

1) Add validation logic the place where read is happening. otherwise files 
will be read 2 two times.
2) All the blocklet index loading distribution logics also will work fine 
if handled while reading.


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768276
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 Set tableBlockIndexUniqueIdentifiers =
 segmentMap.get(segment.getSegmentNo());
 if (tableBlockIndexUniqueIdentifiers == null) {
+  CarbonTable carbonTable = this.getCarbonTable();
+  if (!carbonTable.getTableInfo().isTransactionalTable()) {
+// For NonTransactional table, compare the schema of all index 
files with inferred schema.
+// If there is a mismatch throw exception. As all files must be of 
same schema.
+validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+  }
   tableBlockIndexUniqueIdentifiers =
   BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
   segmentMap.put(segment.getSegmentNo(), 
tableBlockIndexUniqueIdentifiers);
 }
 return tableBlockIndexUniqueIdentifiers;
   }
 
+  private void validateSchemaForNewTranscationalTableFiles(Segment 
segment, CarbonTable carbonTable)
+  throws IOException {
+SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
+Map indexFiles = segment.getCommittedIndexFile();
+for (Map.Entry indexFileEntry : indexFiles.entrySet()) 
{
+  Path indexFile = new Path(indexFileEntry.getKey());
--- End diff --

It seems it's unnecessary to new a path object.


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768333
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 Set tableBlockIndexUniqueIdentifiers =
 segmentMap.get(segment.getSegmentNo());
 if (tableBlockIndexUniqueIdentifiers == null) {
+  CarbonTable carbonTable = this.getCarbonTable();
+  if (!carbonTable.getTableInfo().isTransactionalTable()) {
+// For NonTransactional table, compare the schema of all index 
files with inferred schema.
+// If there is a mismatch throw exception. As all files must be of 
same schema.
+validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+  }
   tableBlockIndexUniqueIdentifiers =
   BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
   segmentMap.put(segment.getSegmentNo(), 
tableBlockIndexUniqueIdentifiers);
 }
 return tableBlockIndexUniqueIdentifiers;
   }
 
+  private void validateSchemaForNewTranscationalTableFiles(Segment 
segment, CarbonTable carbonTable)
+  throws IOException {
+SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
+Map indexFiles = segment.getCommittedIndexFile();
+for (Map.Entry indexFileEntry : indexFiles.entrySet()) 
{
+  Path indexFile = new Path(indexFileEntry.getKey());
+  org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
+  indexFile.toString(), carbonTable.getTableName());
+  TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
+  tableInfo, identifier.getDatabaseName(),
+  identifier.getTableName(),
+  identifier.getTablePath());
+  List indexFileColumnList =
+  wrapperTableInfo.getFactTable().getListOfColumns();
+  List tableColumnList =
+  carbonTable.getTableInfo().getFactTable().getListOfColumns();
+  if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+LOG.error("Schema of " + indexFile.getName()
++ " doesn't match with the table's schema");
+throw new IOException("All the files doesn't have same schema. "
++ "Unsupported operation on nonTransactional table. Check 
logs.");
+  }
+}
+  }
+
+  private boolean compareColumnSchemaList(List 
indexFileColumnList,
--- End diff --

please optimize the method name, such as `isColumnSchemaSame`


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768121
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java 
---
@@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table,
 dataMapSchema, dataMapFactory, blockletDetailsFetcher, 
segmentPropertiesFetcher);
 
 tableIndices.add(dataMap);
-allDataMaps.put(tableUniqueName, tableIndices);
+allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices);
--- End diff --

The name is already lowercased here, no need to convert it here. If it is 
not, there maybe other bugs that cause it.



---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-12 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187768312
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -115,13 +123,55 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName) {
 Set tableBlockIndexUniqueIdentifiers =
 segmentMap.get(segment.getSegmentNo());
 if (tableBlockIndexUniqueIdentifiers == null) {
+  CarbonTable carbonTable = this.getCarbonTable();
+  if (!carbonTable.getTableInfo().isTransactionalTable()) {
+// For NonTransactional table, compare the schema of all index 
files with inferred schema.
+// If there is a mismatch throw exception. As all files must be of 
same schema.
+validateSchemaForNewTranscationalTableFiles(segment, carbonTable);
+  }
   tableBlockIndexUniqueIdentifiers =
   BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
   segmentMap.put(segment.getSegmentNo(), 
tableBlockIndexUniqueIdentifiers);
 }
 return tableBlockIndexUniqueIdentifiers;
   }
 
+  private void validateSchemaForNewTranscationalTableFiles(Segment 
segment, CarbonTable carbonTable)
+  throws IOException {
+SchemaConverter schemaConverter = new 
ThriftWrapperSchemaConverterImpl();
+Map indexFiles = segment.getCommittedIndexFile();
+for (Map.Entry indexFileEntry : indexFiles.entrySet()) 
{
+  Path indexFile = new Path(indexFileEntry.getKey());
+  org.apache.carbondata.format.TableInfo tableInfo = 
CarbonUtil.inferSchemaFromIndexFile(
+  indexFile.toString(), carbonTable.getTableName());
+  TableInfo wrapperTableInfo = 
schemaConverter.fromExternalToWrapperTableInfo(
+  tableInfo, identifier.getDatabaseName(),
+  identifier.getTableName(),
+  identifier.getTablePath());
+  List indexFileColumnList =
+  wrapperTableInfo.getFactTable().getListOfColumns();
+  List tableColumnList =
+  carbonTable.getTableInfo().getFactTable().getListOfColumns();
+  if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
+LOG.error("Schema of " + indexFile.getName()
++ " doesn't match with the table's schema");
+throw new IOException("All the files doesn't have same schema. "
++ "Unsupported operation on nonTransactional table. Check 
logs.");
+  }
+}
+  }
+
+  private boolean compareColumnSchemaList(List 
indexFileColumnList,
+  List tableColumnList) {
+if (indexFileColumnList.size() != tableColumnList.size()) {
+  return false;
--- End diff --

Can you add a log here and line170 to tell the reason.


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-11 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2299#discussion_r187762940
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java 
---
@@ -297,7 +297,7 @@ public TableDataMap registerDataMap(CarbonTable table,
 dataMapSchema, dataMapFactory, blockletDetailsFetcher, 
segmentPropertiesFetcher);
 
 tableIndices.add(dataMap);
-allDataMaps.put(tableUniqueName, tableIndices);
+allDataMaps.put(tableUniqueName.toLowerCase(), tableIndices);
--- End diff --

Is this a bug? It seems not related to this PR


---


[GitHub] carbondata pull request #2299: [CARBONDATA-2472] Fixed:NonTransactional tabl...

2018-05-11 Thread ajantha-bhat
GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2299

[CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue 
induced by PR #2273




Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2299


commit 0846033a4ea91bf6e97a8a32dcbf853c2d98fdb0
Author: ajantha-bhat 
Date:   2018-05-11T08:58:46Z

[CARBONDATA-2472] Fixed:NonTransactional table performance degarde issue 
induced by PR #2273




---