[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4043: IUD Concurrency Improvement
CarbonDataQA2 commented on pull request #4043: URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739619817 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3329/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r533974578 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo.scan.expression; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.UnknownExpression; +import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression; +import org.apache.carbondata.core.scan.filter.executer.FilterExecutor; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl; +import org.apache.carbondata.core.util.CustomIndex; +import org.apache.carbondata.geo.GeoConstants; +import org.apache.carbondata.geo.GeoHashIndex; +import org.apache.carbondata.geo.GeoHashUtils; +import org.apache.carbondata.geo.GeoOperationType; +import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl; + +import org.locationtech.jts.geom.Coordinate; +import org.locationtech.jts.geom.Geometry; +import org.locationtech.jts.geom.GeometryFactory; +import org.locationtech.jts.geom.LineString; +import org.locationtech.jts.geom.Polygon; +import org.locationtech.jts.io.WKTReader; +import org.locationtech.jts.operation.buffer.BufferParameters; + +/** + * InPolylineList expression processor. It inputs the InPolylineList string to the Geo + * implementation's query method, gets a list of range of IDs from each polygon and + * calculates the and/or/diff range list to filter as an output. And then, build + * InExpression with list of all the IDs present in those list of ranges. + */ +@InterfaceAudience.Internal +public class PolylineListExpression extends UnknownExpression +implements ConditionalExpression { + + private static final GeometryFactory geoFactory = new GeometryFactory(); + + private String polylineString; + + private Float bufferInMeter; + + private GeoHashIndex instance; + + private List ranges = new ArrayList(); + + private ColumnExpression column; + + private static final ExpressionResult trueExpRes = + new ExpressionResult(DataTypes.BOOLEAN, true); + + private static final ExpressionResult falseExpRes = + new ExpressionResult(DataTypes.BOOLEAN, false); + + public PolylineListExpression(String polylineString, Float bufferInMeter, String columnName, + CustomIndex indexInstance) { +this.polylineString = polylineString; +this.bufferInMeter = bufferInMeter; +this.instance = (GeoHashIndex) indexInstance; +this.column = new ColumnExpression(columnName, DataTypes.LONG); + } + + private void processExpression() { +try { + // transform the distance unit meter to degree + double buffer = bufferInMeter / GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE; + + // 1. parse the polyline list string and get polygon from each polyline + List polygonList = new ArrayList<>(); + WKTReader wktReader = new WKTReader(); + Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION); + Matcher matcher = pattern.matcher(polylineString); + while (matcher.find()) { +String matchedStr = matcher.group(); +LineString polylineCreatedFromStr = (LineString) wktReader.read(matchedStr); +Polygon polygonFromPolylineBuffer = (Polygon) polylineCreatedFromStr.buffer( +buffer, 0, BufferParameters.CA
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r533974578 ## File path: geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo.scan.expression; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.UnknownExpression; +import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression; +import org.apache.carbondata.core.scan.filter.executer.FilterExecutor; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl; +import org.apache.carbondata.core.util.CustomIndex; +import org.apache.carbondata.geo.GeoConstants; +import org.apache.carbondata.geo.GeoHashIndex; +import org.apache.carbondata.geo.GeoHashUtils; +import org.apache.carbondata.geo.GeoOperationType; +import org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl; + +import org.locationtech.jts.geom.Coordinate; +import org.locationtech.jts.geom.Geometry; +import org.locationtech.jts.geom.GeometryFactory; +import org.locationtech.jts.geom.LineString; +import org.locationtech.jts.geom.Polygon; +import org.locationtech.jts.io.WKTReader; +import org.locationtech.jts.operation.buffer.BufferParameters; + +/** + * InPolylineList expression processor. It inputs the InPolylineList string to the Geo + * implementation's query method, gets a list of range of IDs from each polygon and + * calculates the and/or/diff range list to filter as an output. And then, build + * InExpression with list of all the IDs present in those list of ranges. + */ +@InterfaceAudience.Internal +public class PolylineListExpression extends UnknownExpression +implements ConditionalExpression { + + private static final GeometryFactory geoFactory = new GeometryFactory(); + + private String polylineString; + + private Float bufferInMeter; + + private GeoHashIndex instance; + + private List ranges = new ArrayList(); + + private ColumnExpression column; + + private static final ExpressionResult trueExpRes = + new ExpressionResult(DataTypes.BOOLEAN, true); + + private static final ExpressionResult falseExpRes = + new ExpressionResult(DataTypes.BOOLEAN, false); + + public PolylineListExpression(String polylineString, Float bufferInMeter, String columnName, + CustomIndex indexInstance) { +this.polylineString = polylineString; +this.bufferInMeter = bufferInMeter; +this.instance = (GeoHashIndex) indexInstance; +this.column = new ColumnExpression(columnName, DataTypes.LONG); + } + + private void processExpression() { +try { + // transform the distance unit meter to degree + double buffer = bufferInMeter / GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE; + + // 1. parse the polyline list string and get polygon from each polyline + List polygonList = new ArrayList<>(); + WKTReader wktReader = new WKTReader(); + Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION); + Matcher matcher = pattern.matcher(polylineString); + while (matcher.find()) { +String matchedStr = matcher.group(); +LineString polylineCreatedFromStr = (LineString) wktReader.read(matchedStr); +Polygon polygonFromPolylineBuffer = (Polygon) polylineCreatedFromStr.buffer( +buffer, 0, BufferParameters.CA
[GitHub] [carbondata] QiangCai commented on pull request #4029: refact carbon util
QiangCai commented on pull request #4029: URL: https://github.com/apache/carbondata/pull/4029#issuecomment-739614278 please rebase and perfect the title This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #4043: IUD Concurrency Improvement
Zhangshunyu commented on pull request #4043: URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739612930 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu closed pull request #4040: [WIP][CI TEST]
Zhangshunyu closed pull request #4040: URL: https://github.com/apache/carbondata/pull/4040 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #4040: [WIP][CI TEST]
Zhangshunyu commented on pull request #4040: URL: https://github.com/apache/carbondata/pull/4040#issuecomment-739609664 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-739609378 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4044: [CARBONDATA-4062] Refactor clean files feature
CarbonDataQA2 commented on pull request #4044: URL: https://github.com/apache/carbondata/pull/4044#issuecomment-739562913 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4044: [CARBONDATA-4062] Refactor clean files feature
CarbonDataQA2 commented on pull request #4044: URL: https://github.com/apache/carbondata/pull/4044#issuecomment-739547305 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3304/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4044: [CARBONDATA-4062] Refactor clean files feature
CarbonDataQA2 commented on pull request #4044: URL: https://github.com/apache/carbondata/pull/4044#issuecomment-739547241 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5085/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai opened a new pull request #4044: [CARBONDATA-4062] Refactor clean files feature
QiangCai opened a new pull request #4044: URL: https://github.com/apache/carbondata/pull/4044 ### Why is this PR needed? To prevent accidental deletion of data, carbon will introduce trash data management. It will provide buffer time for accidental deletion of data to roll back the delete operation. Trash data management is a part of carbon data lifecycle management. Clean files as a data trash manager should contain the following two parts. part 1: manage metadata-indexed data trash. This data is at the original place of the table and indexed by metadata. carbon manages this data by metadata index and should avoid using listFile() interface. part 2: manage ".Trash" folder. Now ".Trash" folder is without metadata index, and the operation on it bases on timestamp and listFile() interface. In the future, carbon will index ".Trash" folder to improve data trash management. ### What changes were proposed in this PR? remove data clean function from all features, but keep exception-handling part Notes: the following features still clean data a) drop table/database/partition/index/mv b) insert/load overwrite table/partition only clean files function works as a data trash manager now support concurrent operation with other feature(loading, compaction, update/delete, and so on) ### Does this PR introduce any user interface change? - Yes. (please explain the change and update document) ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4043: IUD Concurrency Improvement
CarbonDataQA2 commented on pull request #4043: URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739539209 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #4043: IUD Concurrency Improvement
marchpure commented on a change in pull request #4043: URL: https://github.com/apache/carbondata/pull/4043#discussion_r537069006 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/iud/IUDConcurrencyTestCase.scala ## @@ -0,0 +1,474 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.iud + +import java.sql.Date +import java.text.SimpleDateFormat +import java.util.concurrent.{Callable, Executors, Future} + +import mockit.{Mock, MockUp} +import org.apache.spark.sql.{DataFrame, Row, SaveMode} +import org.apache.spark.sql.execution.command.mutation.{CarbonProjectForDeleteCommand, CarbonProjectForUpdateCommand} +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.types.StructType +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.exception.ConcurrentOperationException +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.spark.rdd.CarbonDataRDDFactory + +class IUDConcurrencyTestCase extends QueryTest with BeforeAndAfterAll { + + val ONE_LOAD_SIZE = 5 + var testData: DataFrame = _ + + override def beforeAll(): Unit = { +sql("DROP DATABASE IF EXISTS iud_concurrency CASCADE") +sql("CREATE DATABASE iud_concurrency") +sql("USE iud_concurrency") + +buildTestData() + +createTable("orders", testData.schema) +createTable("temp_table", testData.schema) +createTable("orders_temp_table", testData.schema) + +testData.write + .format("carbondata") + .option("tableName", "temp_table") + .option("tempCSV", "false") + .mode(SaveMode.Overwrite) + .save() + +sql("insert into orders select * from temp_table") +sql("insert into orders_temp_table select * from temp_table") + } + + private def buildTestData(): Unit = { +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "-MM-dd") +import sqlContext.implicits._ +val sdf = new SimpleDateFormat("-MM-dd") + +testData = sqlContext.sparkSession.sparkContext.parallelize(1 to ONE_LOAD_SIZE) + .map(value => (value, new Date(sdf.parse("2015-07-" + (value % 10 + 10)).getTime), +"china", "aaa" + value, "phone" + 555 * value, "ASD" + (6 + value), 14999 + value, +"ordersTable" + value)) + .toDF("o_id", "o_date", "o_country", "o_name", +"o_phonetype", "o_serialname", "o_salary", "o_comment") + } + + private def createTable(tableName: String, schema: StructType): Unit = { +val schemaString = schema.fields.map(x => x.name + " " + x.dataType.typeName).mkString(", ") +sql(s"CREATE TABLE $tableName ($schemaString) stored as carbondata tblproperties" + + s"('sort_scope'='local_sort', 'sort_columns'='o_country, o_name, o_phonetype, o_serialname," + + s"o_comment") + } + + // --- Insert and Update + // update -> insert -> update + test("Update should success when insert completes before it") { +val updateSql = "update orders set (o_country)=('newCountry') where o_country='china'" +val insertSql = "insert into orders select * from orders_temp_table" + +val mockInsert = new MockUp[CarbonProjectForUpdateCommand]() { + @Mock + def mockForConcurrentInsertTest(): Unit = { Review comment: you shall not mock in this way. ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/iud/IUDConcurrencyTestCase.scala ## @@ -0,0 +1,474 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distribut
[GitHub] [carbondata] marchpure commented on pull request #4043: IUD Concurrency Improvement
marchpure commented on pull request #4043: URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739523754 CI fails. Please fix the CI failures This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #4043: IUD Concurrency Improvement
marchpure commented on a change in pull request #4043: URL: https://github.com/apache/carbondata/pull/4043#discussion_r537066618 ## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ## @@ -341,8 +341,11 @@ public static boolean updateTableMetadataStatus(Set updatedSegmentsList, // this means for first time it is getting updated . loadMetadata.setUpdateDeltaStartTimestamp(updatedTimeStamp); } -// update end timestamp for each time. +// update delta end timestamp for each time. loadMetadata.setUpdateDeltaEndTimestamp(updatedTimeStamp); +// record end timestamp of operation each time +long operationEndTimestamp = System.currentTimeMillis(); + loadMetadata.setLatestUpdateEndTimestamp(String.valueOf(operationEndTimestamp)); Review comment: take care about format ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ## @@ -334,6 +334,8 @@ object CarbonDataRDDFactory { val segmentLock = CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier, CarbonTablePath.addSegmentPrefix(carbonLoadModel.getSegmentId) + LockUsage.LOCK) +mockForConcurrentTest() Review comment: remove it ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/LoadMetadataDetails.java ## @@ -453,6 +456,14 @@ public void setExtraInfo(String extraInfo) { this.extraInfo = extraInfo; } + public String getLatestUpdateEndTimestamp() { +return latestUpdateEndTimestamp; + } + + public void setLatestUpdateEndTimestamp(String latestUpdateEndTimestamp) { Review comment: don't add parameters in tablestatus. ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ## @@ -607,6 +609,14 @@ object CarbonDataRDDFactory { } } + def mockForConcurrentTest(): Unit = { Review comment: remove it. ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala ## @@ -25,9 +25,11 @@ import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference} import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.command._ +import org.apache.spark.sql.execution.command.mutation.transaction.{TransactionManager, TransactionType} import org.apache.spark.sql.types.LongType import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.exception.ConcurrentOperationException Review comment: just use exception. don't use ConcurrentOperationException ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala ## @@ -113,8 +116,14 @@ case class CarbonProjectForUpdateCommand( updatedRowCount = updatedRowCountTmp if (updatedRowCount == 0) return Seq(Row(0L)) + if (IUDCommonUtil.isTest()) { Review comment: remove it. ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/IUDCommonUtil.scala ## @@ -268,4 +270,43 @@ object IUDCommonUtil { case _ => } } + + def checkIfSegmentsAlreadyUpdated( + carbonTable: CarbonTable, + startTimestamp: String, + updatedSegments: util.Set[String]): Boolean = { + +val loadMetadataDetails = SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +var isChanged = false +breakable { + loadMetadataDetails +.filter(load => updatedSegments.contains(load.getLoadName)) +.foreach(load => + if (load.getLatestUpdateEndTimestamp != null && Review comment: use getTransctionId(). don't use getLatestUpdateEndTimestamp. ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala ## @@ -152,6 +161,9 @@ case class CarbonProjectForUpdateCommand( IndexStoreManager.getInstance() .clearInvalidSegments(carbonTable, deletedSegmentList.asScala.toList.asJava) } catch { + case e: ConcurrentOperationException => Review comment: use exception. remove ConcurrentOperationException ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/IUDCommonUtil.scala ## @@ -268,4 +270,43 @@ object IUDCommonUtil { case _ => } } + + def checkIfSegmentsAlreadyUpdated( + carbonTable: CarbonTable, + startTimestamp: String, + updatedSegments: util.Set[String]): Boolean = { + +val loadMetadataDetails = SegmentStatusManager.readLoadMetadata(
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4043: IUD Concurrency Improvement
CarbonDataQA2 commented on pull request #4043: URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739523165 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Kejian-Li opened a new pull request #4043: IUD Concurrency Improvement
Kejian-Li opened a new pull request #4043: URL: https://github.com/apache/carbondata/pull/4043 Why is this PR needed? Improve concurrency for Insert/Update/Delete What changes were proposed in this PR? Remove update lock in Update and Delete Command and lock the segments operared by Update and Delete Command. Does this PR introduce any user interface change? No Is any new testcase added? Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency
CarbonDataQA2 commented on pull request #4011: URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739493182 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3302/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency
CarbonDataQA2 commented on pull request #4011: URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739493161 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3326/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency
CarbonDataQA2 commented on pull request #4011: URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739493108 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5059/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency
CarbonDataQA2 commented on pull request #4011: URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739492980 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5083/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739492808 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3325/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739492660 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5082/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Kejian-Li commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency
Kejian-Li commented on pull request #4011: URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739488366 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739486536 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5058/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739486393 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3301/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
Indhumathi27 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739483872 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai closed pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
QiangCai closed pull request #4013: URL: https://github.com/apache/carbondata/pull/4013 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager
CarbonDataQA2 commented on pull request #4013: URL: https://github.com/apache/carbondata/pull/4013#issuecomment-739480246 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3297/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4040: [WIP][CI TEST]
CarbonDataQA2 commented on pull request #4040: URL: https://github.com/apache/carbondata/pull/4040#issuecomment-739480247 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4031: [CARBONDATA-4073] Added FT for missing scenarios in Presto
CarbonDataQA2 commented on pull request #4031: URL: https://github.com/apache/carbondata/pull/4031#issuecomment-739480248 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3296/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [CARBONDATA-4070] Added FT for SI and handled missed scenario.
CarbonDataQA2 commented on pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#issuecomment-739480241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator
CarbonDataQA2 commented on pull request #4039: URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739480240 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command
CarbonDataQA2 commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-739480239 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3295/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4042: [CARBONDATA-4069] handled set streaming for SI table or table having SI.
CarbonDataQA2 commented on pull request #4042: URL: https://github.com/apache/carbondata/pull/4042#issuecomment-739480243 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3290/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4041: [CARBONDATA-4068] handled set long string on MT for column on which SI is already created.
CarbonDataQA2 commented on pull request #4041: URL: https://github.com/apache/carbondata/pull/4041#issuecomment-739480242 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5049/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-739480249 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org