[GitHub] [carbondata] marchpure commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


marchpure commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711459338


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-18 Thread GitBox


marchpure commented on pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#issuecomment-711459583


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3934: [WIP] Support Global Unique Id for SegmentNo

2020-10-18 Thread GitBox


QiangCai commented on pull request #3934:
URL: https://github.com/apache/carbondata/pull/3934#issuecomment-711460106


   please close this PR and raise another PR to fix the listFiles issue.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #3985: [CARBONDATA-3965]Fixed float variable target datatype in case of adaptive encoding

2020-10-18 Thread GitBox


nihal0107 commented on pull request #3985:
URL: https://github.com/apache/carbondata/pull/3985#issuecomment-711465736


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380031



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;
+if (blockNameList.contains(blockName)) {
+  blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+}
+if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {
+  List deleteDeltaFileList = 
blockAndDeleteDeltaFilesMap.get(blockName);
+  deleteDeltaFiles = deleteDeltaFileList.toArray(new 
CarbonFile[deleteDeltaFileList.size()]);
+}
+
+// CarbonFile[] deleteDeltaFiles =

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380251



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380663



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;
+if (blockNameList.contains(blockName)) {

Review comment:
   Done. Combined the two judgement





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380824



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;
+if (blockNameList.contains(blockName)) {
+  blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+}
+if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711479262


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2747/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711479603


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4501/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711484392


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2748/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-711484706


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4502/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#issuecomment-711487305


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4499/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#issuecomment-711489234


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2745/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3985: [CARBONDATA-3965]Fixed float variable target datatype in case of adaptive encoding

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3985:
URL: https://github.com/apache/carbondata/pull/3985#issuecomment-711500236


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4500/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3985: [CARBONDATA-3965]Fixed float variable target datatype in case of adaptive encoding

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3985:
URL: https://github.com/apache/carbondata/pull/3985#issuecomment-711515303


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2746/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3950: [CARBONDATA-3889] Enable scalastyle check for all scala test code

2020-10-18 Thread GitBox


ajantha-bhat commented on pull request #3950:
URL: https://github.com/apache/carbondata/pull/3950#issuecomment-711561462


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #3985: [CARBONDATA-3965]Fixed float variable target datatype in case of adaptive encoding

2020-10-18 Thread GitBox


nihal0107 commented on pull request #3985:
URL: https://github.com/apache/carbondata/pull/3985#issuecomment-711567187


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

2020-10-18 Thread GitBox


ajantha-bhat commented on a change in pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#discussion_r507464156



##
File path: 
integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java
##
@@ -92,6 +95,14 @@ public void checkOutputSpecs(FileSystem fileSystem, JobConf 
jobConf) throws IOEx
 }
 String tablePath = 
FileFactory.getCarbonFile(carbonLoadModel.getTablePath()).getAbsolutePath();
 TaskAttemptID taskAttemptID = 
TaskAttemptID.forName(jc.get("mapred.task.id"));
+// taskAttemptID will be null when the insert job is fired from presto. 
Presto send the JobConf
+// and since presto does not use the MR framework for execution, the 
mapred.task.id will be
+// null, so prepare a new ID.
+if (taskAttemptID == null) {
+  SimpleDateFormat formatter = new SimpleDateFormat("MMddHHmm");
+  String jobTrackerId = formatter.format(new Date());
+  taskAttemptID = new TaskAttemptID(jobTrackerId, 0, TaskType.MAP, 0, 0);

Review comment:
   Concurrent insert may use same taskAttemptID. Can you use a UUID as 
taskAttemptID or check how ORC writer is doing?  

##
File path: 
integration/presto/src/main/prestosql/org/apache/carbondata/presto/CarbonDataFileWriter.java
##
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.presto;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat;
+import org.apache.carbondata.hive.CarbonHiveSerDe;
+import org.apache.carbondata.hive.MapredCarbonOutputFormat;
+import org.apache.carbondata.presto.impl.CarbonTableConfig;
+
+import com.google.common.collect.ImmutableList;
+import io.prestosql.plugin.hive.HiveFileWriter;
+import io.prestosql.plugin.hive.HiveType;
+import io.prestosql.plugin.hive.HiveWriteUtils;
+import io.prestosql.spi.Page;
+import io.prestosql.spi.PrestoException;
+import io.prestosql.spi.block.Block;
+import io.prestosql.spi.type.Type;
+import io.prestosql.spi.type.TypeManager;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.FileSinkOperator;
+import org.apache.hadoop.hive.ql.io.HiveOutputFormat;
+import org.apache.hadoop.hive.ql.io.IOConstants;
+import org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import 
org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.log4j.Logger;
+
+import static com.google.common.collect.ImmutableList.toImmutableList;
+import static io.prestosql.plugin.hive.HiveErrorCode.HIVE_WRITER_DATA_ERROR;
+import static java.util.Objects.requireNonNull;
+import static java.util.stream.Collectors.toList;
+import static org.apache.hadoop.hive.conf.HiveConf.ConfVars.COMPRESSRESULT;
+
+/**
+ * This class implements HiveFileWriter and it creates the carbonFileWriter to 
write the page data
+ * sent from presto.
+ */
+public class CarbonDataFileWriter implements HiveFileWriter {
+
+  private static final Logger LOG =
+  LogServiceFactory.getLogService(CarbonDataFileWriter.class.getName());
+
+  private final JobConf configuration;
+  private final Path outPutPath;
+  private final FileSinkOperator.RecordWriter recordWriter;
+  private final CarbonHiveSerDe serDe;
+  private final int fieldCount;
+  private final Object row;
+  private final SettableStructObjectInspector tableInspector;
+  private final List structFields;
+  private final HiveWriteUtils.FieldSetter[] setters;
+
+  private boolean isCommitDone;
+
+  public CarbonDataFileWriter(Path outPutPath, List inputColumnNames, 
Properties properties,
+  JobCon

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3950: [CARBONDATA-3889] Enable scalastyle check for all scala test code

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3950:
URL: https://github.com/apache/carbondata/pull/3950#issuecomment-71106


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4504/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3985: [CARBONDATA-3965]Fixed float variable target datatype in case of adaptive encoding

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3985:
URL: https://github.com/apache/carbondata/pull/3985#issuecomment-711690989


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4505/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3985: [CARBONDATA-3965]Fixed float variable target datatype in case of adaptive encoding

2020-10-18 Thread GitBox


CarbonDataQA1 commented on pull request #3985:
URL: https://github.com/apache/carbondata/pull/3985#issuecomment-711693802


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2751/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org