[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: Readi...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2273 retest this please ---
[jira] [Created] (CARBONDATA-2469) External Table must show its location instead of default store path in describe formatted
Ajantha Bhat created CARBONDATA-2469: Summary: External Table must show its location instead of default store path in describe formatted Key: CARBONDATA-2469 URL: https://issues.apache.org/jira/browse/CARBONDATA-2469 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2289: [CARBONDATA-2435] Remove SDK dependency on spark jar...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2289 @jackylk : ObjectSizeCalculator is called during ReverseDictonaryCache init time [please refer the issue callstack in jira] but reverse dictionary cache no need to initialize for sdk flows . Because it is needed only for dictionary encoding. But sdk don't support dictionary encoding. So, I have moved the initialize of Reverse dictionary cache inside dictionary encoding check. so sdk flow this will not be called. so, that spark class no need to find. hence we dont get this issue exception ---
[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2273#discussion_r187136017 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier); SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope); + +// For NonTransactional table, compare the schema of all index files with inferred schema. +// If there is a mismatch throw exception. As all files must be of same schema. +if (!carbonTable.getTableInfo().isTransactionalTable()) { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + for (Segment segment : segments.getValidSegments()) { +Map indexFiles = segment.getCommittedIndexFile(); +for (Map.Entry indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { +throw new IOException("All the files schema doesn't match. " --- End diff -- @kumarvishal09 :Tested with parquet by having 2 files with same column name but different data type. parquet throws java.lang.UnsupportedOperationException during read. Caused by: java.lang.UnsupportedOperationException: Unimplemented type: StringType at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch(VectorizedColumnReader.java:369) at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:188) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ---
[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2273 retest this please ---
[GitHub] carbondata issue #2289: [CARBONDATA-2435] Remove SDK dependency on spark jar...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2289 @ravipesala : please review the PR ---
[GitHub] carbondata pull request #2289: [CARBONDATA-2435] Remove SDK dependency on sp...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2289 [CARBONDATA-2435] Remove SDK dependency on spark jars. [CARBONDATA-2435] Remove SDK dependency on spark jars. Problem and cause : when sdk writer is used in standalone cluster without spark jars, exception is thrown during reverse dictionary cache initialize time. Solution: carbon SDK doesn't support dictionary encoding, This spark dependency is only for dictionary encoding. Move the spark dependency code inside dictionary encoding if block. So that SDK flow will not have to access spark class. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted?NA - [ ] Document update required?NA - [ ] Testing done. NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_new Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2289 commit 14d95921795aaac4dff9cfe46ffd118e3fdf7388 Author: ajantha-bhat Date: 2018-05-09T12:37:56Z [CARBONDATA-2435] Remove SDK dependency on spark jars. Problem and cause : when sdk writer is used in standalone cluster without spark jars, exception is thrown during reverse dictionary cache initialize time. Solution: carbon SDK doesn't support dictionary encoding, This spark dependency is only for dictionary encoding. so, move the spark dependency code inside dictionary encoding if block. So, that SDK flow will not have to access spark class. ---
[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2273#discussion_r187007180 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier); SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope); + +// For NonTransactional table, compare the schema of all index files with inferred schema. +// If there is a mismatch throw exception. As all files must be of same schema. +if (!carbonTable.getTableInfo().isTransactionalTable()) { + SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl(); + for (Segment segment : segments.getValidSegments()) { +Map indexFiles = segment.getCommittedIndexFile(); +for (Map.Entry indexFileEntry : indexFiles.entrySet()) { + Path indexFile = new Path(indexFileEntry.getKey()); + org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile( + indexFile.toString(), carbonTable.getTableName()); + TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo( + tableInfo, identifier.getDatabaseName(), + identifier.getTableName(), + identifier.getTablePath()); + List indexFileColumnList = + wrapperTableInfo.getFactTable().getListOfColumns(); + List tableColumnList = + carbonTable.getTableInfo().getFactTable().getListOfColumns(); + if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) { +throw new IOException("All the files schema doesn't match. " --- End diff -- @kunal642 : For nonTransactional tables, we support many sdk writers output files to be placed and read from same folder. This works when schema is same, If schema is different we have to inform user that these files are not of same type. If we just ignore fiels how user know why it is ignored ? hence the exception ---
[GitHub] carbondata issue #2283: [CARBONDATA-2457] Add converter to get Carbon SDK Sc...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2283 @ravipesala : PR is ready ---
[GitHub] carbondata issue #2283: [CARBONDATA-2457] Add converter to get Carbon SDK Sc...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2283 retest this please ---
[GitHub] carbondata pull request #2283: [CARBONDATA-2457] Add converter to get Carbon...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2283#discussion_r186939573 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java --- @@ -118,11 +121,131 @@ private Object avroFieldToObject(Schema.Field avroField, Object fieldValue) { break; default: -throw new UnsupportedOperationException(); +throw new UnsupportedOperationException( +"carbon not support " + type.toString() + " avro type yet"); } return out; } + /** + * converts avro schema to carbon schema required by carbonWriter + * + * @param avroSchemaString json formatted avro schema as string + * @return carbon sdk schema + */ + public static org.apache.carbondata.sdk.file.Schema getCarbonSchemaFromAvroSchema( + String avroSchemaString) { +if (avroSchemaString == null) { + throw new UnsupportedOperationException("avro schema string cannot be null"); +} +Schema avroSchema = new Schema.Parser().parse(avroSchemaString); +Field[] carbonField = new Field[avroSchema.getFields().size()]; +int i = 0; +for (Schema.Field avroField : avroSchema.getFields()) { + carbonField[i] = prepareFields(avroField.name(), avroField.schema()); + i++; +} +return new org.apache.carbondata.sdk.file.Schema(carbonField); + } + + private static Field prepareFields(String FieldName, Schema childSchema) { --- End diff -- OK. fixed ---
[GitHub] carbondata issue #2283: [CARBONDATA-2457] Add converter to get Carbon SDK Sc...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2283 retest this please ---
[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2273 retest this please ---
[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2273#discussion_r186725643 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java --- @@ -342,6 +342,30 @@ public void setParentColumnTableRelations( return true; } + /** + * method to compare columnSchema, + * other parameters along with just column name and column data type + * @param obj + * @return + */ + public boolean equalsWithStrictCheck(Object obj) { +if (!this.equals(obj)) { + return false; +} +ColumnSchema other = (ColumnSchema) obj; +if (!columnUniqueId.equals(other.columnUniqueId) || +(isDimensionColumn != other.isDimensionColumn) || +(scale != other.scale) || +(precision != other.precision) || +(isSortColumn != other.isSortColumn)) { + return false; +} +if (encodingList.size() != other.encodingList.size()) { --- End diff -- done. added ---
[GitHub] carbondata pull request #2283: [CARBONDATA-2457] Add converter to get Carbon...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2283 [CARBONDATA-2457] Add converter to get Carbon SDK Schema from Avro schema directly. [CARBONDATA-2457] Add converter to get Carbon SDK Schema from Avro schema directly. In the current implementation, SDK users have to manually create carbon schema of fields from avro schema. This is time-consuming and error-prone. Also, user should not be worried about this logic. So, abstract the carbon schema creation from avro schema by exposing a method to user. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? Added new interface, not modified existing - [ ] Any backward compatibility impacted? NA - [ ] Document update required? will be handled in separate PR - [ ] Testing done yes, updated the test case in TestNonTransactionalCarbonTable.scala - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata multi_level_complex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2283 commit a65a6fd0f6663061ddbc7da146e56e8367e75c9f Author: ajantha-bhat Date: 2018-05-08T06:42:37Z [CARBONDATA-2457] Added converter to get Carbon SDK Schema from Avro schema directly. In the current implementation, SDK users have to manually create carbon schema of fields from avro schema. This is time consuming and error prone. Also user should not be worried about this logic. So, abstract the carbon schema creation from avro schema by exposing a method to user. ---
[jira] [Updated] (CARBONDATA-2457) Add converter to get Carbon SDK Schema from Avro schema directly.
[ https://issues.apache.org/jira/browse/CARBONDATA-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2457: - Description: In the current implementation, SDK users have to manually create carbon schema of fields from avro schema. This is time consuming and error prone. Also usere should not be worried about this logic. So, abstract the carbon schema creation from avro schema by exposing a method to user. > Add converter to get Carbon SDK Schema from Avro schema directly. > - > > Key: CARBONDATA-2457 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2457 > Project: CarbonData > Issue Type: Sub-task > Reporter: Ajantha Bhat >Priority: Major > > In the current implementation, SDK users have to manually create carbon > schema of fields from avro schema. This is time consuming and error prone. > Also usere should not be worried about this logic. > So, abstract the carbon schema creation from avro schema by exposing a method > to user. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2457) Add converter to get Carbon SDK Schema from Avro schema directly.
[ https://issues.apache.org/jira/browse/CARBONDATA-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2457: - Summary: Add converter to get Carbon SDK Schema from Avro schema directly. (was: Add converter to get Carbon sdk Schema from Avro schema directly.) > Add converter to get Carbon SDK Schema from Avro schema directly. > - > > Key: CARBONDATA-2457 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2457 > Project: CarbonData > Issue Type: Sub-task > Reporter: Ajantha Bhat >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2457) Add converter to get Carbon sdk Schema from Avro schema directly.
Ajantha Bhat created CARBONDATA-2457: Summary: Add converter to get Carbon sdk Schema from Avro schema directly. Key: CARBONDATA-2457 URL: https://issues.apache.org/jira/browse/CARBONDATA-2457 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2276: [CARBONDATA-2443][SDK]Multi level complex type suppo...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2276 retest this Please ---
[GitHub] carbondata issue #2276: [CARBONDATA-2443][SDK]Multi level complex type suppo...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2276 retest this please ---
[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2273 retest this please ---
[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2273 retest this please ---
[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Reading two sdk writer outp...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2273 [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception **problem** : when two sdk writer output with differnt schema is placed in same folder for reading, output is not as expected. It has many null output. **root cause:** when multiple carbondata and indexx files is placed in same folder. table schema is inferred by first file. comparing table schema with all other index file schema validation is not present **solution:** comapre table schema with all other index file schema, if there is a mismatch throw exception Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done added the test case in TestNonTransactionalCarbonTable.scala - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata schema_mismatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2273.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2273 commit 03ab5f30909fe971e6221fdab83bdeb2f81e385c Author: ajantha-bhat Date: 2018-05-05T11:29:44Z [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception problem : when two sdk writer output with differnt schema is placed in same folder for reading, output is not as expected. It has many null output. root cause: when multiple carbondata and indexx files is placed in same folder. table schema is inferred by first file. comparing table schema with all other index file schema validation is not present solution: comapre table schema with all other index file schema, if there is a mismatch throw exception ---
[jira] [Created] (CARBONDATA-2442) Reading two sdk writer output with differnt schema should prompt exception
Ajantha Bhat created CARBONDATA-2442: Summary: Reading two sdk writer output with differnt schema should prompt exception Key: CARBONDATA-2442 URL: https://issues.apache.org/jira/browse/CARBONDATA-2442 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
[ https://issues.apache.org/jira/browse/CARBONDATA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reopened CARBONDATA-2369: -- Add about avro complex type > Add a document for Non Transactional table with SDK writer guide > > > Key: CARBONDATA-2369 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 > Project: CarbonData > Issue Type: Sub-task > Reporter: Ajantha Bhat > Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
[ https://issues.apache.org/jira/browse/CARBONDATA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat closed CARBONDATA-2369. Resolution: Fixed updated in PR #2198 > Add a document for Non Transactional table with SDK writer guide > > > Key: CARBONDATA-2369 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 > Project: CarbonData > Issue Type: Sub-task > Reporter: Ajantha Bhat > Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
[ https://issues.apache.org/jira/browse/CARBONDATA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2369: Assignee: Ajantha Bhat > Add a document for Non Transactional table with SDK writer guide > > > Key: CARBONDATA-2369 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 > Project: CarbonData > Issue Type: Sub-task > Reporter: Ajantha Bhat > Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2251: [CARBONDATA-2417] SDK writer goes to infinite wait w...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2251 retest this please ---
[GitHub] carbondata issue #2251: [CARBONDATA-2417] SDK writer goes to infinite wait w...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2251 @ravipesala : handled as per comments. ---
[GitHub] carbondata issue #2240: [CARBONDATA-2313] fixed multiple issues in SDK write...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2240 retest this please ---
[GitHub] carbondata issue #2251: [CARBONDATA-2417] SDK writer goes to infinite wait w...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2251 @ravipesala : Agree, But CarbonOutputIteratorWrapper.closeWriter will try to put batch to queue. so before putting to queue must check whether consumer is dead or busy. so above code changes are required ---
[GitHub] carbondata issue #2224: [CARBONDATA-2393]TaskNo is not working for SDK
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2224 retest this please ---
[GitHub] carbondata pull request #2251: [CARBONDATA-2417] SDK writer goes to infinite...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2251 [CARBONDATA-2417] SDK writer goes to infinite wait when consumer thread is dead [CARBONDATA-2417] problem: SDK writer goes to infinite wait when consumer thread is dead root cause: due to bad record when exception happens at consumer thread during write, this message is not reached producer (sdk writer). So, SDK keeps writing data assuming consumer will consume it. But as consumer is dead. Queue becomes full and queue.put() will be blocked forever. Solution: If cannot be added to queue, check whether consumer is alive or not after every 10 seconds. If not alive throw exception, if alive try again - [ ] Any interfaces changed? no - [ ] Any backward compatibility impacted? no - [ ] Document update required? no - [ ] Testing done updated the testcase in TestNonTransactionalCarbonTable.scala - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata branch3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2251 commit 7f19372fc32a6978ac5aa87f20f346e75a9b1e9d Author: ajantha-bhat Date: 2018-04-28T12:47:35Z [CARBONDATA-2417] problem: SDK writer goes to infinite wait when cosumer thread is dead root cause: due to bad record when exception happens at consumer thread during write, this message is not reached producer (sdk writer). So, SDK keeps writing data assuming consumer will consume it. But as consumer is dead. Queue becomes full and queue.put() will be blocked forever. Solution: If cannot be added to queue, check whether consumer is alive or not after every 10 seconds. If not alive throw exception, if alive try again ---
[jira] [Created] (CARBONDATA-2417) SDK writer goes to infinite wait when consumer thread goes dead
Ajantha Bhat created CARBONDATA-2417: Summary: SDK writer goes to infinite wait when consumer thread goes dead Key: CARBONDATA-2417 URL: https://issues.apache.org/jira/browse/CARBONDATA-2417 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: SDK writer goes to infinite wait when cosumer thread is dead root cause: due to bad record when exception happens at consumer thread during write, this message is not reached producer (sdk writer). So, SDK keeps writing data assuming consumer will consume it. But as consumer is dead. Queue becomes full and queue.put() will be blocked forever. Solution: If cannot be added to queue, check whether consumer is alive or not after every 10 seconds. If not alive throw exception, if alive try again -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2209: [CARBONDATA-2388][SDK]Avro Record Complex Typ...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2209#discussion_r184905554 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/SchemaReader.java --- @@ -81,10 +81,13 @@ public static TableInfo getTableInfo(AbsoluteTableIdentifier identifier) } public static TableInfo inferSchema(AbsoluteTableIdentifier identifier, - boolean isCarbonFileProvider) throws IOException { + boolean isCarbonFileProvider, TableInfo tableInfoFromCache) throws IOException { // This routine is going to infer schema from the carbondata file footer // Convert the ColumnSchema -> TableSchema -> TableInfo. // Return the TableInfo. +if (tableInfoFromCache != null) { --- End diff -- If tableInfoFromCache is found, No need to call inferschema itself, please handle this outside infer schema ---
[GitHub] carbondata pull request #2209: [WIP][Non Transactional Table]Avro Record Com...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2209#discussion_r184904922 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -175,9 +175,31 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { test("test create External Table with Schema with partition, should ignore schema and partition") { -buildTestDataSingleFile() +sql("DROP TABLE IF EXISTS sdkOutputTable") + +// with partition +sql("CREATE EXTERNAL TABLE sdkOutputTable(name string) PARTITIONED BY (age int) STORED BY 'carbondata' LOCATION '/home/root1/avro/files' ") +// +//checkAnswer(sql("select * from sdkOutputTable"), Seq(Row("robot0", 0, 0.0), +// Row("robot1", 1, 0.5), +// Row("robot2", 2, 1.0))) --- End diff -- please revert back this test case and add new one ---
[GitHub] carbondata issue #2240: [CARBONDATA-2313] fixed multiple issues in SDK write...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2240 retest this please ---
[GitHub] carbondata pull request #2240: [CARBONDATA-2313] fixed multiple issues in SD...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2240 [CARBONDATA-2313] fixed multiple issues in SDK writer and external table with NonTransactional table data [CARBONDATA-2313] fixed multiple issues in SDK writer and external table with NonTransactional table data *Header update for sdk interface api. *bad record path issue in sdk writer, should not be "null/null/null/taskno" changed to "sdkBadRecords/taskno". *Non transactional table, Number format exception was coming instead of bad record exception when load fails due to bad record. *Non transactional table, insert overwrite failure case old files must not be deleted. *Non transactional table, describe formatted path should be files path. *SDK, default all dimensions munst be inverted index encoding. *SDK, avro not supporting float datatype. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed?No - [ ] Any backward compatibility impacted?No - [ ] Document update required?NA - [ ] Testing done updated UT test cases. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2240.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2240 commit afe1781a268a45e03072fc319ea214b1e580fbd4 Author: ajantha-bhat Date: 2018-04-23T10:47:29Z [CARBONDATA-2313] fixed muliple issues in SKD writer and external table with nonTransactional table data *Header update for sdk interface api *bad record path issue in sdk writer, should not be "null/null/null/taskno" changed to "sdkBadRecords/taskno" *Non transactional table, Number format exception was coming instead of bad record exception when load fails due to bad record *Non transactional table, insert overwrite failure case old files must not be deleted *Non transactional table, describe formatted path should be files path *SDK, default all dimensions munst be inverted index encoding *SDK, avro not supporting float datatype ---
[GitHub] carbondata issue #2212: [CARBONDATA-2313] fixed issue in query when multiple...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2212 closing the PR. Need to handle by a different solution. ---
[GitHub] carbondata pull request #2212: [CARBONDATA-2313] fixed issue in query when m...
Github user ajantha-bhat closed the pull request at: https://github.com/apache/carbondata/pull/2212 ---
[GitHub] carbondata pull request #2220: [CARBONDATA-2369] FAQ update related to carbo...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2220#discussion_r183928302 --- Diff: docs/faq.md --- @@ -182,3 +183,15 @@ select cntry,sum(gdp) from gdp21,pop1 where cntry=ctry group by cntry; ## Why all executors are showing success in Spark UI even after Dataload command failed at Driver side? Spark executor shows task as failed after the maximum number of retry attempts, but loading the data having bad records and BAD_RECORDS_ACTION (carbon.bad.records.action) is set as âFAILâ will attempt only once but will send the signal to driver as failed instead of throwing the exception to retry, as there is no point to retry if bad record found and BAD_RECORDS_ACTION is set to fail. Hence the Spark executor displays this one attempt as successful but the command has actually failed to execute. Task attempts or executor logs can be checked to observe the failure reason. +## Why different time zone result for select query output when query SDK writer output? +SDK writer is an independent entity, hence SDK writer can generate carbondata files from a non-cluster machine that has different time zones. But at cluster when those files are read, it always takes cluster time-zone. Hence, the value of timestamp and date datatype fields are not original value. +If you do not want to see according to time-zone, then set cluster's time-zone in SDK writer by calling below API. --- End diff -- done. will take this changes in #2198 ---
[GitHub] carbondata pull request #2220: [CARBONDATA-2369] FAQ update related to carbo...
Github user ajantha-bhat closed the pull request at: https://github.com/apache/carbondata/pull/2220 ---
[GitHub] carbondata issue #2220: [CARBONDATA-2369] FAQ update related to carbon SDK s...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2220 This will be handled in #2198. No need of separate PR ---
[GitHub] carbondata pull request #2220: [CARBONDATA-2369] FAQ update related to carbo...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2220 [CARBONDATA-2369] FAQ update related to carbon SDK scenario [CARBONDATA-2369] FAQ update related to carbon SDK scenario - [ ] Any interfaces changed? no - [ ] Any backward compatibility impacted? no - [ ] Document update required? yes, updated - [ ] Testing done. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata faq Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2220 commit 6ee180f0c4f11207e73b19a46ab48ba01ec7128a Author: ajantha-bhat Date: 2018-04-24T08:20:35Z [CARBONDATA-2369] FAQ update related to SDK scenario ---
[GitHub] carbondata pull request #2198: [CARBONDATA-2369] Add a document for Non Tran...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2198#discussion_r183625844 --- Diff: docs/data-management-on-carbondata.md --- @@ -174,6 +174,50 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ``` +## CREATE EXTERNAL TABLE + This function allows user to create external table by specifying location. + ``` + CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name + STORED BY 'carbondata' LOCATION â$FilesPathâ + ``` + +### Create external table on managed table data location. + Managed table data location provided will have both FACT and Metadata folder. + This data can be generated by creating a normal carbon table and use this path as $FilesPath in the above syntax. + + **Example:** + ``` + sql("CREATE TABLE origin(key INT, value STRING) STORED BY 'carbondata'") + sql("INSERT INTO origin select 100,'spark'") + sql("INSERT INTO origin select 200,'hive'") + // creates a table in $storeLocation/origin + + sql(s""" + |CREATE EXTERNAL TABLE source + |STORED BY 'carbondata' + |LOCATION '$storeLocation/origin' + """.stripMargin) + checkAnswer(sql("SELECT count(*) from source"), sql("SELECT count(*) from origin")) + ``` + +### Create external table on Non-Transactional table data location. --- End diff -- done. Modified. ---
[GitHub] carbondata issue #2198: [CARBONDATA-2369] Add a document for Non Transaction...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2198 retest this please ---
[jira] [Updated] (CARBONDATA-2313) Support Non Transactional carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon NonTranscational Table.pdf > Support Non Transactional carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon NonTranscational Table.pdf > > Time Spent: 18h 50m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Non Transactional carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: (was: carbon unmanaged table desgin doc_V1.0.pdf) > Support Non Transactional carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon NonTranscational Table.pdf > > Time Spent: 18h 50m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2212: [CARBONDATA-2313] fixed issue in query when m...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2212 [CARBONDATA-2313] fixed issue in query when multiple sdk writer's out⦠problem: [Non-tranactional table] issue in query when multiple sdk writer's ouput files with same column name with same UUID is placed in single path scenario: copy one set of sdk writer output, do select query. copy other set of sdk writer output with same column name and same UUID, do select query. select query result's only first sets output. root cause : segment map was not updated with the latest files , if it is new files placed belong to same segment. solution: refresh the segment map, if there is a change in list of carobnindex files during each query for non transactional tables. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed?NA - [ ] Any backward compatibility impacted?NA - [ ] Document update required?NA - [ ] Testing done. done. updated the testcase with same uuid. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2212.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2212 commit 92d85cc706de0345b9a7cfe0eda81f7e92139235 Author: ajantha-bhat Date: 2018-04-23T10:47:29Z [CARBONDATA-2313] fixed issue in query when multiple sdk writer's output files with same column name with same UUID is placed in single path ---
[GitHub] carbondata pull request #2190: [CARBONDATA-2359] Support applicable load opt...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2190#discussion_r183274384 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -123,6 +125,43 @@ public CarbonWriterBuilder uniqueIdentifier(long UUID) { return this; } + /** + * To support the load options for sdk writer + * @param options key,value pair of load options. + *supported keys values are + *a. bad_records_logger_enable -- true, false --- End diff -- ok ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 retest this please ---
[GitHub] carbondata pull request #2190: [CARBONDATA-2359] Support applicable load opt...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2190#discussion_r183242493 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -75,6 +78,11 @@ public TableSchemaBuilder tableName(String tableName) { return this; } + public TableSchemaBuilder resetTransactionalTable(boolean isTransactionalTable) { --- End diff -- ok ---
[GitHub] carbondata pull request #2190: [CARBONDATA-2359] Support applicable load opt...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2190#discussion_r183242496 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -210,11 +260,26 @@ private CarbonTable buildCarbonTable() { tableSchemaBuilder = tableSchemaBuilder.blockletSize(blockletSize); } -List sortColumnsList; -if (sortColumns != null) { - sortColumnsList = Arrays.asList(sortColumns); +if (!isTransactionalTable) { --- End diff -- ok ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata pull request #2190: [CARBONDATA-2359] Support applicable load opt...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2190#discussion_r183227722 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -196,11 +235,26 @@ private CarbonTable buildCarbonTable() { tableSchemaBuilder = tableSchemaBuilder.blockletSize(blockletSize); } -List sortColumnsList; -if (sortColumns != null) { - sortColumnsList = Arrays.asList(sortColumns); +if (isUnManagedTable) { + tableSchemaBuilder = tableSchemaBuilder.isUnmanagedTable(isUnManagedTable); +} + +List sortColumnsList = new ArrayList<>(); --- End diff -- ok. done ---
[GitHub] carbondata pull request #2190: [CARBONDATA-2359] Support applicable load opt...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2190#discussion_r183227711 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -123,6 +125,43 @@ public CarbonWriterBuilder uniqueIdentifier(long UUID) { return this; } + /** + * To support the load options for sdk writer + * @param options key,value pair of load options. + *supported keys values are + *a. bad_records_logger_enable -- true, false + *b. bad_records_action -- FAIL, FORCE, IGNORE, REDIRECT + *c. bad_record_path -- path + *d. dateformat -- same as JAVA SimpleDateFormat + *e. timestampformat -- same as JAVA SimpleDateFormat + * @return updated CarbonWriterBuilder + */ + public CarbonWriterBuilder withLoadOptions(Map options) { --- End diff -- ok. done ---
[GitHub] carbondata pull request #2190: [CARBONDATA-2359] Support applicable load opt...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2190#discussion_r183227700 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java --- @@ -119,7 +126,18 @@ public TableSchemaBuilder addColumn(StructField field, boolean isSortColumn) { } newColumn.setSchemaOrdinal(ordinal++); newColumn.setColumnar(true); -newColumn.setColumnUniqueId(UUID.randomUUID().toString()); + +// For unmanagedTable, multiple sdk writer output with same column name can be placed in --- End diff -- ok. done ---
[GitHub] carbondata issue #2177: [CARBONDATA-2360][Non Transactional Table]Insert int...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2177 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this Please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata pull request #2198: [CARBONDATA-2369] Add a document for Non Tran...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2198#discussion_r183035777 --- Diff: docs/sdk-writer-guide.md --- @@ -0,0 +1,140 @@ +# SDK Writer Guide +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar. +This SDK writer, writes carbondata file and carbonindex file at a given path. +External client can make use of this writer to convert other format data or live data to create carbondata and index files. +These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present. + +## Quick example + +```scala + import java.io.IOException; + + import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; + import org.apache.carbondata.core.metadata.datatype.DataTypes; + import org.apache.carbondata.sdk.file.CarbonWriter; + import org.apache.carbondata.sdk.file.CarbonWriterBuilder; + import org.apache.carbondata.sdk.file.Field; + import org.apache.carbondata.sdk.file.Schema; + + public class TestSdk { + + public static void main(String[] args) throws IOException, InvalidLoadOptionException { + testSdkWriter(); + } + + public static void testSdkWriter() throws IOException, InvalidLoadOptionException { + String path ="/home/root1/Documents/ab/temp"; + + Field[] fields =new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + + Schema schema =new Schema(fields); + + CarbonWriterBuilder builder = CarbonWriter.builder() + .withSchema(schema) + .outputPath(path); + + CarbonWriter writer = builder.buildWriterForCSVInput(); + + int rows = 5; + for (int i = 0; i < rows; i++) { + writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + } + } +``` + +## Datatypes Mapping +Each of SQL data types are mapped into data types of SDK. Following are the mapping: +| SQL DataTypes | Mapped SDK DataTypes | --- End diff -- ok. Fixed it. ---
[GitHub] carbondata pull request #2198: [CARBONDATA-2369] Add a document for Non Tran...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2198 [CARBONDATA-2369] Add a document for Non Transactional table with SDK writer guide [CARBONDATA-2369] Add a document for Non Transactional table with SDK writer guide As per PR#2131 [CARBONDATA-2313] Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? yes, updated - [ ] Testing done -- NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2198 commit 4506e44f75f7723a9e1b18c110a9b68bdbe0582d Author: ajantha-bhat Date: 2018-04-20T11:06:37Z [CARBONDATA-2369] Add a document for Non Transactional table with SDK writer guide ---
[jira] [Created] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
Ajantha Bhat created CARBONDATA-2369: Summary: Add a document for Non Transactional table with SDK writer guide Key: CARBONDATA-2369 URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata issue #2190: [CARBONDATA-2359] Support applicable load options an...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2190 Retest this please ---
[GitHub] carbondata pull request #2190: [CARBONDATA-2359] Support applicable load opt...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2190 [CARBONDATA-2359] Support applicable load options and table properties for Non-Transactional table [CARBONDATA-2359] Support applicable load options and table properties for a Non-Transactional table And blocked clean files for a Non-Transactional table. your contribution quickly and easily: - [ ] Any interfaces changed? No. Added new interfaces. Didn't modified any. - [ ] Any backward compatibility impacted? No - [ ] Document update required? yes, will be handled in separate PR - [ ] Testing done. please refer TestUnmanagedCarbonTable.scala - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. Done. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2190.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2190 commit 5f9aabaa7ed6498357c806617da2f00fceebbaf3 Author: ajantha-bhat Date: 2018-04-19T13:11:20Z [CARBONDATA-2359] Support applicable load options and table properties for Non-Transactional table ---
[jira] [Created] (CARBONDATA-2359) Support applicable load options and table properties for unmanaged table
Ajantha Bhat created CARBONDATA-2359: Summary: Support applicable load options and table properties for unmanaged table Key: CARBONDATA-2359 URL: https://issues.apache.org/jira/browse/CARBONDATA-2359 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 Retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 Retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 Retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Fixed SDK writer issues and...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r181650233 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -354,7 +369,13 @@ public static Object getMeasureDefaultValueByType(ColumnSchema columnSchema, // then setting measure exists is true // otherwise adding a default value of a measure for (CarbonMeasure carbonMeasure : currentBlockMeasures) { -if (carbonMeasure.getColumnId().equals(queryMeasure.getMeasure().getColumnId())) { +// If it is unmanaged table just check the column names, no need to validate column id as +// multiple sdk's output placed in a single folder doesn't have same column ID but can +// have same column name +if (carbonMeasure.getColumnId().equals(queryMeasure.getMeasure().getColumnId()) || +((queryModel != null) && (queryModel.getTable().getTableInfo().isUnManagedTable()) && --- End diff -- removed null check. cannot call from a method as one is for measure type and other is a dimension type. ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Fixed SDK writer issues and...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r181648541 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -337,11 +351,12 @@ public static Object getMeasureDefaultValueByType(ColumnSchema columnSchema, * @param blockExecutionInfo * @param queryMeasuresmeasures present in query * @param currentBlockMeasures current block measures + * @param queryModel carbonQueryModel * @return measures present in the block */ public static List createMeasureInfoAndGetCurrentBlockQueryMeasures( BlockExecutionInfo blockExecutionInfo, List queryMeasures, - List currentBlockMeasures) { + List currentBlockMeasures, QueryModel queryModel) { --- End diff -- OK ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Fixed SDK writer issues and...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r181648562 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -354,7 +369,13 @@ public static Object getMeasureDefaultValueByType(ColumnSchema columnSchema, // then setting measure exists is true // otherwise adding a default value of a measure for (CarbonMeasure carbonMeasure : currentBlockMeasures) { -if (carbonMeasure.getColumnId().equals(queryMeasure.getMeasure().getColumnId())) { +// If it is unmanaged table just check the column names, no need to validate column id as +// multiple sdk's output placed in a single folder doesn't have same column ID but can +// have same column name +if (carbonMeasure.getColumnId().equals(queryMeasure.getMeasure().getColumnId()) || +((queryModel != null) && (queryModel.getTable().getTableInfo().isUnManagedTable()) && --- End diff -- ok ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Fixed SDK writer issues and...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r181647929 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestUnmanagedCarbonTable.scala --- @@ -266,16 +270,23 @@ class TestUnmanagedCarbonTable extends QueryTest with BeforeAndAfterAll { .contains("Unsupported operation on unmanaged table")) //12. Streaming table creation -// External table don't accept table properties +// No need as External table don't accept table properties + +//13. Alter table rename command +exception = intercept[MalformedCarbonCommandException] { + sql("ALTER TABLE sdkOutputTable RENAME to newTable") +} +assert(exception.getMessage() + .contains("Unsupported operation on unmanaged table")) sql("DROP TABLE sdkOutputTable") -// drop table should not delete the files + //drop table should not delete the files --- End diff -- done ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Fixed SDK writer issues and...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r181646642 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala --- @@ -73,22 +73,27 @@ private[sql] case class CarbonAlterTableRenameCommand( s"Table $oldDatabaseName.$oldTableName does not exist") throwMetadataException(oldDatabaseName, oldTableName, "Table does not exist") } + +var carbonTable: CarbonTable = null +carbonTable = metastore.lookupRelation(Some(oldDatabaseName), oldTableName)(sparkSession) --- End diff -- OK. modified ---
[GitHub] carbondata pull request #2141: [CARBONDATA-2313] Fixed SDK writer issues and...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2141#discussion_r181646540 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -565,7 +563,9 @@ class CarbonFileMetastore extends CarbonMetaStore { tableModifiedTimeStore.get(CarbonCommonConstants.DATABASE_DEFAULT_NAME))) { metadata.carbonTables = metadata.carbonTables.filterNot( table => table.getTableName.equalsIgnoreCase(tableIdentifier.table) && - table.getDatabaseName.equalsIgnoreCase(tableIdentifier.database.getOrElse("default"))) + table.getDatabaseName.equalsIgnoreCase( --- End diff -- Ran all the test case. It is not coming now. Removed this changes ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h5. Support unmanaged carbon table (was: h1. Support unmanaged carbon table) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h1. Support unmanaged carbon table > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h1. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437058#comment-16437058 ] Ajantha Bhat commented on CARBONDATA-2313: -- Attached the design document. > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon unmanaged table desgin doc_V1.0.pdf > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: (was: carbon unamanged table desgin doc_V1.0.pdf) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 @jackylk & @gvramana : please review this PR. ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Summary: Support unmanaged carbon table (was: Support Reading unmanaged carbon table) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unamanged table desgin doc_V1.0.pdf > > Time Spent: 10h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Reading unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon unamanged table desgin doc_V1.0.pdf > Support Reading unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature > Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unamanged table desgin doc_V1.0.pdf > > Time Spent: 10h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed drop table issue in cluster ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---