[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
VenuReddy2103 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r474425224 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -145,47 +152,153 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { sql("drop table if exists carbon_table") } - test("test insert / update with data more than 32000 characters") { + private def createTableAndLoadData (badRecordAction: String): Unit = { +BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar") +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='${badRecordAction}','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") + } + + test("test load / insert / update with data more than 32000 characters and bad record action as Redirect") { +createTableAndLoadData("REDIRECT") +var redirectCsvPath = BadRecordUtil + .getRedirectCsvPath("default", "longerthan32kchar", "0", "0") +assert(BadRecordUtil.checkRedirectedCsvContentAvailableInSource(testdata, redirectCsvPath)) +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "REDIRECT"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "1", "0") +var redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +var iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("33000,"+longChar+",4")) +} + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "0", "1") +redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("ok,"+longChar+",1")) +} +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception +intercept[Exception] { + sql(s"insert into longerthan32kchar values('32000', '$longChar', 3)") +} +BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar") + } + + test("test load / insert / update with data more than 32000 characters and bad record action as Force") { +createTableAndLoadData("FORCE") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2), Row("32123", null, 3))) CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") -val testdata =s"$resourcesPath/32000char.csv" -sql("drop table if exists load32000chardata") -sql("drop table if exists load32000chardata_dup") -sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 int) STORED AS carbondata") -sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata OPTIONS('FILEHEADER'='dim1,dim2,mes1')") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "FORCE"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), + Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2), Row("32123", null, 3), Row("33000", null, 4))) + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), + Seq(Row("ok", null, 1), Row("itsok", "hello", 2), Row("32123", null, 3), Row("33000", null, 4))) +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception intercept[Exception] { - sql("insert into load32000chardata_dup select dim1,concat(load32000chardata.dim2,'
[jira] [Resolved] (CARBONDATA-3925) flink-integration write carbon file to hdfs error
[ https://issues.apache.org/jira/browse/CARBONDATA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3925. -- Fix Version/s: (was: 2.0.1) Resolution: Fixed > flink-integration write carbon file to hdfs error > - > > Key: CARBONDATA-3925 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3925 > Project: CarbonData > Issue Type: Bug > Components: flink-integration >Affects Versions: 2.0.0 >Reporter: yutao >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > in CarbonWriter.java code ,you can find this; > public abstract class *{color:#FF}CarbonWriter{color}* extends > ProxyFileWriter { > private static final Logger LOGGER = > > LogServiceFactory.getLogService({color:#FF}CarbonS3Writer{color}.class.getName());} > always wo can find logfile print like ; > 2020-07-27 14:19:25,107 DEBUG org.apache.carbon.flink.CarbonS3Writer > this is puzzledï¼› > 2020-08-19 13:14:13,329 INFO > org.apache.carbondata.hadoop.api.CarbonTableOutputFormat - Closed writer task > attempt_f229b922-1f77-426f-a4bc-42e49aa53df7__m_1968253873_-1049302646 > 2020-08-19 13:14:13,329 DEBUG org.apache.carbon.flink.CarbonLocalWriter - > Commit write. org.apache.carbon.flink.CarbonLocalWriter@41f5c4a9 > 2020-08-19 13:14:13,329 DEBUG org.apache.carbon.flink.CarbonS3Writer - Upload > file[/home/hadoop/yutest/d963e9836ccb4318aa8fc953af983d07/part-0-a132f98547584dcabae6c43090626baf_batchno0-0-null-1597814047953.snappy.carbondata] > to [hdfs://beh/user/dc_cbss/warehouse/yutest/tf_b_trade/stage_data] start. > 2020-08-19 13:14:13,329 INFO org.apache.carbondata.core.util.CarbonUtil - > Copying > /home/hadoop/yutest/d963e9836ccb4318aa8fc953af983d07/part-0-a132f98547584dcabae6c43090626baf_batchno0-0-null-1597814047953.snappy.carbondata > to hdfs://beh/user/dc_cbss/warehouse/yutest/tf_b_trade/stage_data, operation > id 1597814053329 > 2020-08-19 13:14:13,331 DEBUG org.apache.carbondata.core.util.CarbonUtil > -{color:#FF} The configured block size is 1 KB, the actual carbon file > size is 277 KB, choose the max value 277 KB as the block size on HDFS{color} > 2020-08-19 13:14:13,331 DEBUG org.apache.carbondata.core.util.CarbonUtil - > HDFS file block size for file: > hdfs://beh/user/dc_cbss/warehouse/yutest/tf_b_trade/stage_data/part-0-a132f98547584dcabae6c43090626baf_batchno0-0-null-1597814047953.snappy.carbondata > is 284160 (bytes > 2020-08-19 13:14:13,332 INFO > org.apache.carbondata.processing.util.CarbonLoaderUtil - Deleted the local > store location: > /tmp/f97548ae6efc43d2ba269c9d35295bb9_attempt_f229b922-1f77-426f-a4bc-42e49aa53df7__m_1968253873_-1049302646 > : Time taken: 2 > 2020-08-19 13:14:13,358 ERROR org.apache.carbon.flink.CarbonS3Writer - > Problem while copying file from local store to carbon store > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: > Problem while copying file from local store to carbon store > at > org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2694) > at > org.apache.carbon.flink.CarbonWriter.uploadSegmentDataFiles(CarbonWriter.java:90) > at > org.apache.carbon.flink.CarbonLocalWriter.commit(CarbonLocalWriter.java:155) > at > org.apache.carbon.flink.CarbonLocalWriter.flush(CarbonLocalWriter.java:129) > at > org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:61) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:239) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:280) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:253) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:250) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:241) > at > org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.snapshotState(StreamingFileSink.java:447) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:402) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator
[GitHub] [carbondata] asfgit closed pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
asfgit closed pull request #3892: URL: https://github.com/apache/carbondata/pull/3892 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
VenuReddy2103 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r474418423 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -145,47 +152,153 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { sql("drop table if exists carbon_table") } - test("test insert / update with data more than 32000 characters") { + private def createTableAndLoadData (badRecordAction: String): Unit = { +BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar") +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='${badRecordAction}','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") + } + + test("test load / insert / update with data more than 32000 characters and bad record action as Redirect") { +createTableAndLoadData("REDIRECT") +var redirectCsvPath = BadRecordUtil + .getRedirectCsvPath("default", "longerthan32kchar", "0", "0") +assert(BadRecordUtil.checkRedirectedCsvContentAvailableInSource(testdata, redirectCsvPath)) +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "REDIRECT"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "1", "0") +var redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +var iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("33000,"+longChar+",4")) +} + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "0", "1") +redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("ok,"+longChar+",1")) +} +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception +intercept[Exception] { Review comment: After intercept, we can check for the string/substring we expect ? Same in below testcase also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r474419263 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -145,47 +152,153 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { sql("drop table if exists carbon_table") } - test("test insert / update with data more than 32000 characters") { + private def createTableAndLoadData (badRecordAction: String): Unit = { +BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar") +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='${badRecordAction}','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") + } + + test("test load / insert / update with data more than 32000 characters and bad record action as Redirect") { +createTableAndLoadData("REDIRECT") +var redirectCsvPath = BadRecordUtil + .getRedirectCsvPath("default", "longerthan32kchar", "0", "0") +assert(BadRecordUtil.checkRedirectedCsvContentAvailableInSource(testdata, redirectCsvPath)) +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "REDIRECT"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "1", "0") +var redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +var iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("33000,"+longChar+",4")) +} + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "0", "1") +redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("ok,"+longChar+",1")) +} +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception +intercept[Exception] { Review comment: we can't check here because this is system generated exception. Not user formated exception. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
ajantha-bhat commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-678049710 LGTM. Thanks for the contribution! Spark 2.3.4 build failure is a known random issue, we are working on it. So, merging this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678048812 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
VenuReddy2103 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r474418423 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -145,47 +152,153 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { sql("drop table if exists carbon_table") } - test("test insert / update with data more than 32000 characters") { + private def createTableAndLoadData (badRecordAction: String): Unit = { +BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar") +sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) STORED AS carbondata") +sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar OPTIONS('FILEHEADER'='dim1,dim2,mes1', " + + s"'BAD_RECORDS_ACTION'='${badRecordAction}','BAD_RECORDS_LOGGER_ENABLE'='TRUE')") + } + + test("test load / insert / update with data more than 32000 characters and bad record action as Redirect") { +createTableAndLoadData("REDIRECT") +var redirectCsvPath = BadRecordUtil + .getRedirectCsvPath("default", "longerthan32kchar", "0", "0") +assert(BadRecordUtil.checkRedirectedCsvContentAvailableInSource(testdata, redirectCsvPath)) +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "true") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "REDIRECT"); +sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)") +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "1", "0") +var redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +var iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("33000,"+longChar+",4")) +} + +// Update strings of length greater than 32000 +sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " + + "where longerthan32kchar.mes1=1").show() +checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", "hello", 2))) +redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", "longerthan32kchar", "0", "1") +redirectedFileLineList = FileUtils.readLines(redirectCsvPath) +iterator = redirectedFileLineList.iterator() +while (iterator.hasNext) { + assert(iterator.next().equals("ok,"+longChar+",1")) +} +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT, "false") + +// Insert longer string without converter step will throw exception +intercept[Exception] { Review comment: After intercept, we can check for the string/substring we expect ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-678025176 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3827/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 commented on pull request #3891: [CARBONDATA-3889] Enable scala check style
xubo245 commented on pull request #3891: URL: https://github.com/apache/carbondata/pull/3891#issuecomment-678023289 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 commented on pull request #3891: [CARBONDATA-3889] Enable scala check style
xubo245 commented on pull request #3891: URL: https://github.com/apache/carbondata/pull/3891#issuecomment-678023209 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-678023200 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2086/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat edited a comment on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678010210 @marchpure : > Is there a testcase about STRUCT, in which we decode BINARY by BASE64. Changes are done for presto read (**hex or base64 decode for binary is needed during the write (converter) step**, not at the read step), once decode is done writer will store it as a byte[], presto/spark just reads the byte[] > we shall combine 3 commits into single commit. right? This doesn't matter, once all review done, I can squash or the committer will squash and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat edited a comment on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678010210 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678010210 @marchpure : > Is there a testcase about STRUCT, in which we decode BINARY by BASE64. Changes are done for presto read (**binary or base 64 decode is needed during the write (converter) step**, not the read step), once decode is done writer will store it as a byte[], presto/spark just reads the byte[] > we shall combine 3 commits into single commit. right? This doesn't matter, once all review done, I can squash or the committer will squash and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat edited a comment on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678010210 @marchpure : > Is there a testcase about STRUCT, in which we decode BINARY by BASE64. Changes are done for presto read (**binary or base 64 decode is needed during the write (converter) step**, not the read step), once decode is done writer will store it as a byte[], presto/spark just reads the byte[] > we shall combine 3 commits into single commit. right? This doesn't matter, once all review done, I can squash or the committer will squash and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3891: [CARBONDATA-3889] Enable scala check style
QiangCai commented on a change in pull request #3891: URL: https://github.com/apache/carbondata/pull/3891#discussion_r474375939 ## File path: pom.xml ## @@ -431,9 +431,10 @@ false ${basedir}/src/main/scala ${basedir}/src/test/scala - ${dev.path}/scalastyle-config.xml + scalastyle-config.xml Review comment: moved scalastyle-config.xml into the parent project folder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 commented on a change in pull request #3891: [CARBONDATA-3889] Enable scala check style
xubo245 commented on a change in pull request #3891: URL: https://github.com/apache/carbondata/pull/3891#discussion_r474373904 ## File path: pom.xml ## @@ -431,9 +431,10 @@ false ${basedir}/src/main/scala ${basedir}/src/test/scala - ${dev.path}/scalastyle-config.xml + scalastyle-config.xml Review comment: why. do you remove ${dev.path}/? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677927680 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2085/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677927481 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3826/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677873437 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677856451 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2084/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677852439 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3825/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-677851709 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3824/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-677851535 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2083/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677849591 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2082/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677848269 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3823/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677798305 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-677795141 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677792448 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677790243 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2079/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677789319 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3820/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-677788548 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2081/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
CarbonDataQA1 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-677788476 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3822/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure edited a comment on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
marchpure edited a comment on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677786847 Is there a testcase about STRUCT, in which we decode BINARY by BASE64. we shall combine 3 commits into single commit. right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
marchpure commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677786847 Is there a testcase about STRUCT, in which we decode BINARY by BASE64. we shall combine 3 commits into single commit. right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure removed a comment on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
marchpure removed a comment on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-677785897 Is there a testcase about STRUCT, in which we decode BINARY by BASE64. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine
marchpure commented on pull request #3885: URL: https://github.com/apache/carbondata/pull/3885#issuecomment-677785897 Is there a testcase about STRUCT, in which we decode BINARY by BASE64. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677783679 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3817/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
nihal0107 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r474138074 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2468,4 +2468,10 @@ private CarbonCommonConstants() { * index server temp folder aging period default value 3hours. */ public static final String CARBON_INDEXSERVER_TEMPFOLDER_DELETETIME_DEFAULT = "1080"; + + public static final String STRING_LENGTH_EXCEEDED_MESSAGE = + "Record %s of column %s exceeded " + MAX_CHARS_PER_COLUMN_DEFAULT + + " characters. Please consider long string data type."; + + public static final String FORCE_BAD_RECORD_ACTION = "FORCE"; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
VenuReddy2103 commented on a change in pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#discussion_r474130519 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2468,4 +2468,10 @@ private CarbonCommonConstants() { * index server temp folder aging period default value 3hours. */ public static final String CARBON_INDEXSERVER_TEMPFOLDER_DELETETIME_DEFAULT = "1080"; + + public static final String STRING_LENGTH_EXCEEDED_MESSAGE = + "Record %s of column %s exceeded " + MAX_CHARS_PER_COLUMN_DEFAULT + + " characters. Please consider long string data type."; + + public static final String FORCE_BAD_RECORD_ACTION = "FORCE"; Review comment: This string constant do not seem be used anywhere. Please remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677768606 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2076/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677750531 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2075/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677744032 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3816/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
akashrn5 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r474030501 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestIndexRepair.scala ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.spark.testsuite.secondaryindex + +import org.scalatest.BeforeAndAfterAll + +import org.apache.spark.sql.test.util.QueryTest + +/** + * test cases for testing create index table + */ +class TestIndexRepair extends QueryTest with BeforeAndAfterAll { + + override def beforeAll { +sql("drop table if exists maintable") +sql("drop table if exists indextable1") +sql("drop table if exists indextable2") + } + + test("reindex command after deleting segments from SI table") { +sql("drop table if exists maintable") +sql("CREATE TABLE maintable(a INT, b STRING, c STRING) stored as carbondata") +sql("CREATE INDEX indextable1 on table maintable(c) as 'carbondata'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +sql("INSERT INTO maintable SELECT 1,'string1', 'string2'") +val preDeleteSegments = sql("SHOW SEGMENTS FOR TABLE INDEXTABLE1").count() +sql("DELETE FROM TABLE INDEXTABLE1 WHERE SEGMENT.ID IN(0,1)") +sql("CLEAN FILES FOR TABLE INDEXTABLE1") +val postDeleteSegments = sql("SHOW SEGMENTS FOR TABLE INDEXTABLE1").count() +assert(preDeleteSegments!=postDeleteSegments) +sql("REINDEX INDEX TABLE indextable1 ON MAINTABLE") +val postRepairSegments = sql("SHOW SEGMENTS FOR TABLE INDEXTABLE1").count() +assert(preDeleteSegments == postRepairSegments) Review comment: you should also consider, adding a query and check if its hitting SI after reindex. Please check for all test cases ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Show indexes on the table + */ +case class IndexRepairCommand(indexname: Option[String], tableNameOp: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand{ + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // table level and index level + val databaseName = if (tableNameOp.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableNameOp.
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677736438 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3821/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677734623 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2080/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677708488 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3818/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [WIP][CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677708011 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2077/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.
akashrn5 commented on pull request #3865: URL: https://github.com/apache/carbondata/pull/3865#issuecomment-677699305 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677686802 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677681239 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2074/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677679732 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3813/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677670245 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2072/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
ajantha-bhat commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677668597 @yutaoChina : you can run `mvn clean install -Pspark-2.3 -Pbuild-with-format -DskipTests` in your IDE to find all findbugs, checkstyle, scalastyle issues. Now also it has some check style issue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
ajantha-bhat commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677664277 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677647032 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2069/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-677642444 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3815/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI
CarbonDataQA1 commented on pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#issuecomment-677642207 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3810/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677636575 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3814/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677635399 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2073/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3856: [CARBONDATA-3929]Improve CDC performance
ajantha-bhat commented on a change in pull request #3856: URL: https://github.com/apache/carbondata/pull/3856#discussion_r473876207 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala ## @@ -106,18 +106,34 @@ case class CarbonMergeDataSetCommand( // decide join type based on match conditions val joinType = decideJoinType +val joinColumn = mergeMatches.joinExpr.expr.asInstanceOf[EqualTo].left + .asInstanceOf[UnresolvedAttribute].nameParts.tail.head +// repartition the the srsDs, if the target as bucketing and the bucketing column and join Review comment: ```suggestion // repartition the srcDs, if the target has bucketing and the bucketing column and join ``` ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala ## @@ -106,18 +106,34 @@ case class CarbonMergeDataSetCommand( // decide join type based on match conditions val joinType = decideJoinType +val joinColumn = mergeMatches.joinExpr.expr.asInstanceOf[EqualTo].left + .asInstanceOf[UnresolvedAttribute].nameParts.tail.head +// repartition the the srsDs, if the target as bucketing and the bucketing column and join +// column are same +val repartitionedSrsDs = Review comment: ```suggestion val repartitionedSrcDs = ``` ## File path: integration/spark/src/main/spark2.3/org/apache/spark/sql/avro/AvroFileFormatFactory.scala ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.avro + +import com.databricks.spark.avro.{AvroReader, AvroWriter} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.Row +import org.apache.spark.sql.execution.datasources.OutputWriterFactory + +object AvroFileFormatFactory { Review comment: same as above doubt, may be just use databricks spark acro for both 2.3 and 2.4 ## File path: integration/spark/pom.xml ## @@ -153,6 +153,28 @@ + + com.databricks + spark-avro_${scala.binary.version} + 4.0.0 + + + org.apache.avro + avro + + + + + org.apache.spark + spark-avro_${scala.binary.version} Review comment: why can't spark2.3 and 2.4 both use databricks spark-avro ? I can understand that other way around is not possible (for both to use spark avro) ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala ## @@ -439,6 +449,11 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String], def insertData(loadParams: CarbonLoadParams): (Seq[Row], LoadMetadataDetails) = { var rows = Seq.empty[Row] +val loadDataFrame = if (updateModel.isDefined && !updateModel.get.loadAsNewSegment) { + Some(CommonLoadUtils.getDataFrameWithTupleID(Some(dataFrame))) Review comment: This InsertIntoCommand flow is not meant for update flow yet. Because update will have an implicit column and rearrange schema and all will fail. so, I suggest if `updateModel.get.loadAsNewSegment` is `false` throw unsupported exception now and handle this requirement later. Also when `updateModel.get.loadAsNewSegment = true` (which is our current cdc history data case), **this flow can be used** (as it is just a insert, no actual update flow used). only when `updateModel.get.loadAsNewSegment = false` we cannot use this flow. so someone might use it because of update model support. so, I suggest to throw an exception in the beginning of this function when `updateModel.get.loadAsNewSegment = false` ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertIntoCommand.scala ## @@ -439,6 +449,11 @@ case class CarbonInsertIntoCommand(databaseNameOp: Option[String], def insertData(loadParams: CarbonLoadParams): (Seq[Row], LoadMetadataDetails) = { var rows = Seq.empty[Row] +val loadDataFrame = if (updateModel.i
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677573694 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677550766 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3809/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: [CARBONDATA-3925] flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677539801 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2068/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI
Indhumathi27 commented on a change in pull request #3778: URL: https://github.com/apache/carbondata/pull/3778#discussion_r473873670 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/command/SICreationCommand.scala ## @@ -443,10 +443,34 @@ private[sql] case class CarbonCreateSecondaryIndexCommand( databaseName: String, tableName: String, indexTableName: String, absoluteTableIdentifier: AbsoluteTableIdentifier): TableInfo = { var schemaOrdinal = -1 -var allColumns = indexModel.columnNames.map { indexCol => - val colSchema = carbonTable.getDimensionByName(indexCol).getColumnSchema +val complexDimensions = carbonTable.getAllDimensions.asScala + .filter(dim => dim.getDataType.isComplexType && + indexModel.columnNames.asJava.contains(dim.getColName)) +if (complexDimensions.size > 1) { + throw new ErrorMessage("SI creation with more than one complex type is not supported yet"); +} +var allColumns = List[ColumnSchema]() Review comment: handled This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677513013 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3808/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677508525 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2067/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677495917 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3807/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
CarbonDataQA1 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-677489268 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2066/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] yutaoChina commented on pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write
yutaoChina commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677487173 > > > @yutaoChina : Thanks for working on this. > a) please handle the compilation error > b) please create a jira issue and add it in the issue header my jira id is yutaochina This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677449345 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677445838 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2064/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write
CarbonDataQA1 commented on pull request #3892: URL: https://github.com/apache/carbondata/pull/3892#issuecomment-677439466 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3805/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677439289 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3804/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-677438159 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2063/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org