[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739619817


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3329/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-06 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533974578



##
File path: 
geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import 
org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.GeoOperationType;
+import 
org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+import org.locationtech.jts.geom.Coordinate;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+import org.locationtech.jts.io.WKTReader;
+import org.locationtech.jts.operation.buffer.BufferParameters;
+
+/**
+ * InPolylineList expression processor. It inputs the InPolylineList string to 
the Geo
+ * implementation's query method, gets a list of range of IDs from each 
polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, 
build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolylineListExpression extends UnknownExpression
+implements ConditionalExpression {
+
+  private static final GeometryFactory geoFactory = new GeometryFactory();
+
+  private String polylineString;
+
+  private Float bufferInMeter;
+
+  private GeoHashIndex instance;
+
+  private List ranges = new ArrayList();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolylineListExpression(String polylineString, Float bufferInMeter, 
String columnName,
+  CustomIndex indexInstance) {
+this.polylineString = polylineString;
+this.bufferInMeter = bufferInMeter;
+this.instance = (GeoHashIndex) indexInstance;
+this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+try {
+  // transform the distance unit meter to degree
+  double buffer = bufferInMeter / 
GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE;
+
+  // 1. parse the polyline list string and get polygon from each polyline
+  List polygonList = new ArrayList<>();
+  WKTReader wktReader = new WKTReader();
+  Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION);
+  Matcher matcher = pattern.matcher(polylineString);
+  while (matcher.find()) {
+String matchedStr = matcher.group();
+LineString polylineCreatedFromStr = (LineString) 
wktReader.read(matchedStr);
+Polygon polygonFromPolylineBuffer = (Polygon) 
polylineCreatedFromStr.buffer(
+buffer, 0, BufferParameters.CA

[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-06 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r533974578



##
File path: 
geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolylineListExpression.java
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo.scan.expression;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.UnknownExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecutor;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import 
org.apache.carbondata.core.scan.filter.resolver.RowLevelFilterResolverImpl;
+import org.apache.carbondata.core.util.CustomIndex;
+import org.apache.carbondata.geo.GeoConstants;
+import org.apache.carbondata.geo.GeoHashIndex;
+import org.apache.carbondata.geo.GeoHashUtils;
+import org.apache.carbondata.geo.GeoOperationType;
+import 
org.apache.carbondata.geo.scan.filter.executor.PolygonFilterExecutorImpl;
+
+import org.locationtech.jts.geom.Coordinate;
+import org.locationtech.jts.geom.Geometry;
+import org.locationtech.jts.geom.GeometryFactory;
+import org.locationtech.jts.geom.LineString;
+import org.locationtech.jts.geom.Polygon;
+import org.locationtech.jts.io.WKTReader;
+import org.locationtech.jts.operation.buffer.BufferParameters;
+
+/**
+ * InPolylineList expression processor. It inputs the InPolylineList string to 
the Geo
+ * implementation's query method, gets a list of range of IDs from each 
polygon and
+ * calculates the and/or/diff range list to filter as an output. And then, 
build
+ * InExpression with list of all the IDs present in those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolylineListExpression extends UnknownExpression
+implements ConditionalExpression {
+
+  private static final GeometryFactory geoFactory = new GeometryFactory();
+
+  private String polylineString;
+
+  private Float bufferInMeter;
+
+  private GeoHashIndex instance;
+
+  private List ranges = new ArrayList();
+
+  private ColumnExpression column;
+
+  private static final ExpressionResult trueExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, true);
+
+  private static final ExpressionResult falseExpRes =
+  new ExpressionResult(DataTypes.BOOLEAN, false);
+
+  public PolylineListExpression(String polylineString, Float bufferInMeter, 
String columnName,
+  CustomIndex indexInstance) {
+this.polylineString = polylineString;
+this.bufferInMeter = bufferInMeter;
+this.instance = (GeoHashIndex) indexInstance;
+this.column = new ColumnExpression(columnName, DataTypes.LONG);
+  }
+
+  private void processExpression() {
+try {
+  // transform the distance unit meter to degree
+  double buffer = bufferInMeter / 
GeoConstants.CONVERSION_FACTOR_OF_METER_TO_DEGREE;
+
+  // 1. parse the polyline list string and get polygon from each polyline
+  List polygonList = new ArrayList<>();
+  WKTReader wktReader = new WKTReader();
+  Pattern pattern = Pattern.compile(GeoConstants.POLYLINE_REG_EXPRESSION);
+  Matcher matcher = pattern.matcher(polylineString);
+  while (matcher.find()) {
+String matchedStr = matcher.group();
+LineString polylineCreatedFromStr = (LineString) 
wktReader.read(matchedStr);
+Polygon polygonFromPolylineBuffer = (Polygon) 
polylineCreatedFromStr.buffer(
+buffer, 0, BufferParameters.CA

[GitHub] [carbondata] QiangCai commented on pull request #4029: refact carbon util

2020-12-06 Thread GitBox


QiangCai commented on pull request #4029:
URL: https://github.com/apache/carbondata/pull/4029#issuecomment-739614278


   please rebase and perfect the title



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


Zhangshunyu commented on pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739612930


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu closed pull request #4040: [WIP][CI TEST]

2020-12-06 Thread GitBox


Zhangshunyu closed pull request #4040:
URL: https://github.com/apache/carbondata/pull/4040


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on pull request #4040: [WIP][CI TEST]

2020-12-06 Thread GitBox


Zhangshunyu commented on pull request #4040:
URL: https://github.com/apache/carbondata/pull/4040#issuecomment-739609664


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-06 Thread GitBox


shenjiayu17 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-739609378


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#issuecomment-739562913







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#issuecomment-739547305


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3304/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#issuecomment-739547241


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5085/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai opened a new pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


QiangCai opened a new pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044


### Why is this PR needed?
To prevent accidental deletion of data, carbon will introduce trash data 
management. It will provide buffer time for accidental deletion of data to roll 
back the delete operation.
   
   Trash data management is a part of carbon data lifecycle management. Clean 
files as a data trash manager should contain the following two parts.
   part 1: manage metadata-indexed data trash.
 This data is at the original place of the table and indexed by metadata. 
carbon manages this data by metadata index and should avoid using listFile() 
interface.
   part 2: manage ".Trash" folder.
  Now ".Trash" folder is without metadata index, and the operation on it 
bases on timestamp and listFile() interface. In the future, carbon will index 
".Trash" folder to improve data trash management.

### What changes were proposed in this PR?
   remove data clean function from all features, but keep exception-handling 
part
   Notes: the following features still clean data
   a) drop table/database/partition/index/mv
   b) insert/load overwrite table/partition
   only clean files function works as a data trash manager now
   support concurrent operation with other feature(loading, compaction, 
update/delete, and so on)
   
### Does this PR introduce any user interface change?
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739539209







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


marchpure commented on a change in pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043#discussion_r537069006



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/iud/IUDConcurrencyTestCase.scala
##
@@ -0,0 +1,474 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.iud
+
+import java.sql.Date
+import java.text.SimpleDateFormat
+import java.util.concurrent.{Callable, Executors, Future}
+
+import mockit.{Mock, MockUp}
+import org.apache.spark.sql.{DataFrame, Row, SaveMode}
+import 
org.apache.spark.sql.execution.command.mutation.{CarbonProjectForDeleteCommand, 
CarbonProjectForUpdateCommand}
+import org.apache.spark.sql.test.util.QueryTest
+import org.apache.spark.sql.types.StructType
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.exception.ConcurrentOperationException
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.spark.rdd.CarbonDataRDDFactory
+
+class IUDConcurrencyTestCase extends QueryTest with BeforeAndAfterAll {
+
+  val ONE_LOAD_SIZE = 5
+  var testData: DataFrame = _
+
+  override def beforeAll(): Unit = {
+sql("DROP DATABASE IF EXISTS iud_concurrency CASCADE")
+sql("CREATE DATABASE iud_concurrency")
+sql("USE iud_concurrency")
+
+buildTestData()
+
+createTable("orders", testData.schema)
+createTable("temp_table", testData.schema)
+createTable("orders_temp_table", testData.schema)
+
+testData.write
+  .format("carbondata")
+  .option("tableName", "temp_table")
+  .option("tempCSV", "false")
+  .mode(SaveMode.Overwrite)
+  .save()
+
+sql("insert into orders select * from temp_table")
+sql("insert into orders_temp_table select * from temp_table")
+  }
+
+  private def buildTestData(): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "-MM-dd")
+import sqlContext.implicits._
+val sdf = new SimpleDateFormat("-MM-dd")
+
+testData = sqlContext.sparkSession.sparkContext.parallelize(1 to 
ONE_LOAD_SIZE)
+  .map(value => (value, new Date(sdf.parse("2015-07-" + (value % 10 + 
10)).getTime),
+"china", "aaa" + value, "phone" + 555 * value, "ASD" + (6 + 
value), 14999 + value,
+"ordersTable" + value))
+  .toDF("o_id", "o_date", "o_country", "o_name",
+"o_phonetype", "o_serialname", "o_salary", "o_comment")
+  }
+
+  private def createTable(tableName: String, schema: StructType): Unit = {
+val schemaString = schema.fields.map(x => x.name + " " + 
x.dataType.typeName).mkString(", ")
+sql(s"CREATE TABLE $tableName ($schemaString) stored as carbondata 
tblproperties" +
+  s"('sort_scope'='local_sort', 'sort_columns'='o_country, o_name, 
o_phonetype, o_serialname," +
+  s"o_comment")
+  }
+
+  // --- Insert and Update 
+  // update -> insert -> update
+  test("Update should success when insert completes before it") {
+val updateSql = "update orders set (o_country)=('newCountry') where 
o_country='china'"
+val insertSql = "insert into orders select * from orders_temp_table"
+
+val mockInsert = new MockUp[CarbonProjectForUpdateCommand]() {
+  @Mock
+  def mockForConcurrentInsertTest(): Unit = {

Review comment:
   you shall not mock in this way.

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/iud/IUDConcurrencyTestCase.scala
##
@@ -0,0 +1,474 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distribut

[GitHub] [carbondata] marchpure commented on pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


marchpure commented on pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739523754


   CI fails. Please fix the CI failures



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


marchpure commented on a change in pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043#discussion_r537066618



##
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##
@@ -341,8 +341,11 @@ public static boolean 
updateTableMetadataStatus(Set updatedSegmentsList,
   // this means for first time it is getting updated .
   loadMetadata.setUpdateDeltaStartTimestamp(updatedTimeStamp);
 }
-// update end timestamp for each time.
+// update delta end timestamp for each time.
 loadMetadata.setUpdateDeltaEndTimestamp(updatedTimeStamp);
+// record end timestamp of operation each time
+long operationEndTimestamp = System.currentTimeMillis();
+
loadMetadata.setLatestUpdateEndTimestamp(String.valueOf(operationEndTimestamp));

Review comment:
   take care about format

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -334,6 +334,8 @@ object CarbonDataRDDFactory {
 val segmentLock = 
CarbonLockFactory.getCarbonLockObj(carbonTable.getAbsoluteTableIdentifier,
   CarbonTablePath.addSegmentPrefix(carbonLoadModel.getSegmentId) + 
LockUsage.LOCK)
 
+mockForConcurrentTest()

Review comment:
   remove it

##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/LoadMetadataDetails.java
##
@@ -453,6 +456,14 @@ public void setExtraInfo(String extraInfo) {
 this.extraInfo = extraInfo;
   }
 
+  public String getLatestUpdateEndTimestamp() {
+return latestUpdateEndTimestamp;
+  }
+
+  public void setLatestUpdateEndTimestamp(String latestUpdateEndTimestamp) {

Review comment:
   don't add parameters in tablestatus.

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -607,6 +609,14 @@ object CarbonDataRDDFactory {
 }
   }
 
+  def mockForConcurrentTest(): Unit = {

Review comment:
   remove it.

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForDeleteCommand.scala
##
@@ -25,9 +25,11 @@ import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.execution.command._
+import 
org.apache.spark.sql.execution.command.mutation.transaction.{TransactionManager,
 TransactionType}
 import org.apache.spark.sql.types.LongType
 
 import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.exception.ConcurrentOperationException

Review comment:
   just use exception. don't use ConcurrentOperationException

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##
@@ -113,8 +116,14 @@ case class CarbonProjectForUpdateCommand(
   updatedRowCount = updatedRowCountTmp
   if (updatedRowCount == 0) return Seq(Row(0L))
 
+  if (IUDCommonUtil.isTest()) {

Review comment:
   remove it.

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/IUDCommonUtil.scala
##
@@ -268,4 +270,43 @@ object IUDCommonUtil {
   case _ =>
 }
   }
+
+  def checkIfSegmentsAlreadyUpdated(
+  carbonTable: CarbonTable,
+  startTimestamp: String,
+  updatedSegments: util.Set[String]): Boolean = {
+
+val loadMetadataDetails = 
SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+var isChanged = false
+breakable {
+  loadMetadataDetails
+.filter(load => updatedSegments.contains(load.getLoadName))
+.foreach(load =>
+  if (load.getLatestUpdateEndTimestamp != null &&

Review comment:
   use getTransctionId(). don't use getLatestUpdateEndTimestamp.

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##
@@ -152,6 +161,9 @@ case class CarbonProjectForUpdateCommand(
   IndexStoreManager.getInstance()
 .clearInvalidSegments(carbonTable, 
deletedSegmentList.asScala.toList.asJava)
 } catch {
+  case e: ConcurrentOperationException =>

Review comment:
   use exception. remove ConcurrentOperationException 

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/IUDCommonUtil.scala
##
@@ -268,4 +270,43 @@ object IUDCommonUtil {
   case _ =>
 }
   }
+
+  def checkIfSegmentsAlreadyUpdated(
+  carbonTable: CarbonTable,
+  startTimestamp: String,
+  updatedSegments: util.Set[String]): Boolean = {
+
+val loadMetadataDetails = 
SegmentStatusManager.readLoadMetadata(

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043#issuecomment-739523165







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Kejian-Li opened a new pull request #4043: IUD Concurrency Improvement

2020-12-06 Thread GitBox


Kejian-Li opened a new pull request #4043:
URL: https://github.com/apache/carbondata/pull/4043


   Why is this PR needed?
   Improve concurrency for Insert/Update/Delete
   
   What changes were proposed in this PR?
   Remove update lock in Update and Delete Command and lock the segments 
operared by Update and Delete Command.
   
   Does this PR introduce any user interface change?
   No
   Is any new testcase added?
   Yes
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4011:
URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739493182


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3302/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4011:
URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739493161


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3326/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4011:
URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739493108


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5059/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4011:
URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739492980


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5083/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739492808


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3325/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739492660


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5082/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Kejian-Li commented on pull request #4011: [CARBONDATA-4003] Improve IUD Concurrency

2020-12-06 Thread GitBox


Kejian-Li commented on pull request #4011:
URL: https://github.com/apache/carbondata/pull/4011#issuecomment-739488366


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739486536


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5058/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739486393


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3301/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-06 Thread GitBox


Indhumathi27 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739483872


   Retest this please 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai closed pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-06 Thread GitBox


QiangCai closed pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-739480246


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3297/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4040: [WIP][CI TEST]

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4040:
URL: https://github.com/apache/carbondata/pull/4040#issuecomment-739480247







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4031: [CARBONDATA-4073] Added FT for missing scenarios in Presto

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4031:
URL: https://github.com/apache/carbondata/pull/4031#issuecomment-739480248


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3296/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [CARBONDATA-4070] Added FT for SI and handled missed scenario.

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#issuecomment-739480241







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-739480240







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-739480239


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3295/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4042: [CARBONDATA-4069] handled set streaming for SI table or table having SI.

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4042:
URL: https://github.com/apache/carbondata/pull/4042#issuecomment-739480243


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3290/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4041: [CARBONDATA-4068] handled set long string on MT for column on which SI is already created.

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4041:
URL: https://github.com/apache/carbondata/pull/4041#issuecomment-739480242


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5049/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-06 Thread GitBox


CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-739480249







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




<    1   2