[jira] [Updated] (CARBONDATA-4134) MERGE INTO SQL Command is successful with wrong input for action

2021-02-18 Thread PURUJIT CHAUGULE (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PURUJIT CHAUGULE updated CARBONDATA-4134:
-
Description: 
STEPS:

DROP TABLE IF EXISTS A;
 DROP TABLE IF EXISTS B;
 CREATE TABLE IF NOT EXISTS A(id Int,price Int, state String) STORED AS 
carbondata; 
 CREATE TABLE IF NOT EXISTS B(id Int, price Int,state String ) STORED AS 
carbondata;
 INSERT INTO A VALUES (1,100,"MA");
 INSERT INTO A VALUES (2,200,"NY");
 INSERT INTO A VALUES (3,300,"NH");
 INSERT INTO A VALUES (4,400,"FL");

INSERT INTO B VALUES (1,1,"MA (updated)");
 INSERT INTO B VALUES (2,3,"NY (updated)");
 INSERT INTO B VALUES (3,3,"CA (updated)");
 INSERT INTO B VALUES (5,5,"TX (updated)");
 INSERT INTO B VALUES (7,7,"LO (updated)");

0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
MATCHED THEN;
 +-+
|Result|

+-+
 +-+
 No rows selected 
 0: jdbc:hive2://linux1:22550/>
 0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
MATCHED THEN X;
 +-+
|Result|

+-+
 +-+
 No rows selected 
 0: jdbc:hive2://linux1:22550/>
 0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN MATCHED 
THEN a.b;
 +-+
|Result|

+-+
 +-+
 No rows selected 
 0: jdbc:hive2://linux1:22550/>
 0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN MATCHED 
THEN 1;
 +-+
|Result|

+-+
 +-+
 No rows selected 

  was:
STEPS:

DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
CREATE TABLE IF NOT EXISTS A(id Int,price Int, state String) STORED AS 
carbondata; 
CREATE TABLE IF NOT EXISTS B(id Int, price Int,state String ) STORED AS 
carbondata;
 INSERT INTO A VALUES (1,100,"MA");
 INSERT INTO A VALUES (2,200,"NY");
 INSERT INTO A VALUES (3,300,"NH");
 INSERT INTO A VALUES (4,400,"FL");

INSERT INTO B VALUES (1,1,"MA (updated)");
 INSERT INTO B VALUES (2,3,"NY (updated)");
 INSERT INTO B VALUES (3,3,"CA (updated)");
 INSERT INTO B VALUES (5,5,"TX (updated)");
 INSERT INTO B VALUES (7,7,"LO (updated)");

0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
MATCHED THEN;
+-+
| Result |
+-+
+-+
No rows selected (16.256 seconds)
0: jdbc:hive2://linux1:22550/>
0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
MATCHED THEN X;
+-+
| Result |
+-+
+-+
No rows selected (5.565 seconds)
0: jdbc:hive2://linux1:22550/>
0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN MATCHED 
THEN a.b;
+-+
| Result |
+-+
+-+
No rows selected (39.731 seconds)
0: jdbc:hive2://linux1:22550/>
0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN MATCHED 
THEN 1;
+-+
| Result |
+-+
+-+
No rows selected (14.754 seconds)


> MERGE INTO SQL Command is successful with wrong input for action
> 
>
> Key: CARBONDATA-4134
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4134
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 2.1.0
> Environment: Spark 2.4.5
>Reporter: PURUJIT CHAUGULE
>Priority: Minor
>
> STEPS:
> DROP TABLE IF EXISTS A;
>  DROP TABLE IF EXISTS B;
>  CREATE TABLE IF NOT EXISTS A(id Int,price Int, state String) STORED AS 
> carbondata; 
>  CREATE TABLE IF NOT EXISTS B(id Int, price Int,state String ) STORED AS 
> carbondata;
>  INSERT INTO A VALUES (1,100,"MA");
>  INSERT INTO A VALUES (2,200,"NY");
>  INSERT INTO A VALUES (3,300,"NH");
>  INSERT INTO A VALUES (4,400,"FL");
> INSERT INTO B VALUES (1,1,"MA (updated)");
>  INSERT INTO B VALUES (2,3,"NY (updated)");
>  INSERT INTO B VALUES (3,3,"CA (updated)");
>  INSERT INTO B VALUES (5,5,"TX (updated)");
>  INSERT INTO B VALUES (7,7,"LO (updated)");
> 0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
> MATCHED THEN;
>  +-+
> |Result|
> +-+
>  +-+
>  No rows selected 
>  0: jdbc:hive2://linux1:22550/>
>  0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
> MATCHED THEN X;
>  +-+
> |Result|
> +-+
>  +-+
>  No rows selected 
>  0: jdbc:hive2://linux1:22550/>
>  0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN 
> MATCHED THEN a.b;
>  +-+
> |Result|
> +-+
>  +-+
>  No rows selected 
>  0: jdbc:hive2://linux1:22550/>
>  0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN 
> MATCHED THEN 1;
>  +-+
> |Result|
> +-+
>  +-+
>  No rows selected 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578990010



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -71,14 +71,19 @@ public static void cleanStaleSegments(CarbonTable 
carbonTable)
 // Deleting the stale Segment folders and the segment file.
 try {
   CarbonUtil.deleteFoldersAndFiles(segmentPath);
+  LOGGER.info("Deleted the segment folder :" + 
segmentPath.getAbsolutePath() + " after"
+  + " moving it to the trash folder");
   // delete the segment file as well
   
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
   staleSegmentFile));
+  LOGGER.info("Deleted stale segment file after moving it to the 
trash folder :"
+  + staleSegmentFile);

Review comment:
   done, actually the segment file will not be moved to trash, it will just 
be delete straightaway, Handled the same

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##
@@ -71,14 +71,19 @@ public static void cleanStaleSegments(CarbonTable 
carbonTable)
 // Deleting the stale Segment folders and the segment file.
 try {
   CarbonUtil.deleteFoldersAndFiles(segmentPath);
+  LOGGER.info("Deleted the segment folder :" + 
segmentPath.getAbsolutePath() + " after"
+  + " moving it to the trash folder");
   // delete the segment file as well
   
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath(),
   staleSegmentFile));
+  LOGGER.info("Deleted stale segment file after moving it to the 
trash folder :"
+  + staleSegmentFile);
   for (String duplicateStaleSegmentFile : redundantSegmentFile) {
 if 
(DataFileUtil.getSegmentNoFromSegmentFile(duplicateStaleSegmentFile)
 .equals(segmentNumber)) {
   
FileFactory.deleteFile(CarbonTablePath.getSegmentFilePath(carbonTable
   .getTablePath(), duplicateStaleSegmentFile));
+  LOGGER.info("Deleted redundant segment file :" + 
duplicateStaleSegmentFile);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (CARBONDATA-4118) User Input for GeoID column not validated.

2021-02-18 Thread PURUJIT CHAUGULE (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PURUJIT CHAUGULE closed CARBONDATA-4118.

Resolution: Duplicate

> User Input for GeoID column not validated.
> --
>
> Key: CARBONDATA-4118
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4118
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.1.0
>Reporter: PURUJIT CHAUGULE
>Priority: Minor
>
> * User Input for geoId column can be paired to multiple pairs of source 
> columns values (correct internally calculated geoID values are different for 
> such above source columns values).
>  * The advantage of using geoID is not applicable when taking user input for 
> GeoId column is not validated. GeoID value is only generated internally if 
> user doesnot input the geoID column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578987875



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1135,13 +1135,16 @@ public static void deleteSegment(String tablePath, 
Segment segment,
 Map> indexFilesMap = fileStore.getIndexFilesMap();
 for (Map.Entry> entry : indexFilesMap.entrySet()) {
   FileFactory.deleteFile(entry.getKey());
+  LOGGER.info("File deleted after clean files operation: " + 
entry.getKey());
   for (String file : entry.getValue()) {
 String[] deltaFilePaths =
 updateStatusManager.getDeleteDeltaFilePath(file, 
segment.getSegmentNo());
 for (String deltaFilePath : deltaFilePaths) {
   FileFactory.deleteFile(deltaFilePath);
+  LOGGER.info("File deleted after clean files operation: " + 
deltaFilePath);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4134) MERGE INTO SQL Command is successful with wrong input for action

2021-02-18 Thread PURUJIT CHAUGULE (Jira)
PURUJIT CHAUGULE created CARBONDATA-4134:


 Summary: MERGE INTO SQL Command is successful with wrong input for 
action
 Key: CARBONDATA-4134
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4134
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 2.1.0
 Environment: Spark 2.4.5
Reporter: PURUJIT CHAUGULE


STEPS:

DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
CREATE TABLE IF NOT EXISTS A(id Int,price Int, state String) STORED AS 
carbondata; 
CREATE TABLE IF NOT EXISTS B(id Int, price Int,state String ) STORED AS 
carbondata;
 INSERT INTO A VALUES (1,100,"MA");
 INSERT INTO A VALUES (2,200,"NY");
 INSERT INTO A VALUES (3,300,"NH");
 INSERT INTO A VALUES (4,400,"FL");

INSERT INTO B VALUES (1,1,"MA (updated)");
 INSERT INTO B VALUES (2,3,"NY (updated)");
 INSERT INTO B VALUES (3,3,"CA (updated)");
 INSERT INTO B VALUES (5,5,"TX (updated)");
 INSERT INTO B VALUES (7,7,"LO (updated)");

0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
MATCHED THEN;
+-+
| Result |
+-+
+-+
No rows selected (16.256 seconds)
0: jdbc:hive2://linux1:22550/>
0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN NOT 
MATCHED THEN X;
+-+
| Result |
+-+
+-+
No rows selected (5.565 seconds)
0: jdbc:hive2://linux1:22550/>
0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN MATCHED 
THEN a.b;
+-+
| Result |
+-+
+-+
No rows selected (39.731 seconds)
0: jdbc:hive2://linux1:22550/>
0: jdbc:hive2://linux1:22550/> MERGE INTO A USING B ON A.ID=B.ID WHEN MATCHED 
THEN 1;
+-+
| Result |
+-+
+-+
No rows selected (14.754 seconds)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578982799



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1135,13 +1135,16 @@ public static void deleteSegment(String tablePath, 
Segment segment,
 Map> indexFilesMap = fileStore.getIndexFilesMap();
 for (Map.Entry> entry : indexFilesMap.entrySet()) {
   FileFactory.deleteFile(entry.getKey());
+  LOGGER.info("File deleted after clean files operation: " + 
entry.getKey());

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578982527



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -983,7 +983,28 @@ public static boolean 
isLoadInProgress(AbsoluteTableIdentifier absoluteTableIden
 }
   }
 
-  private static boolean isLoadDeletionRequired(LoadMetadataDetails[] details) 
{
+  public static boolean isExpiredSegment(LoadMetadataDetails oneLoad, 
AbsoluteTableIdentifier
+  absoluteTableIdentifier) {
+boolean result = false;

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578982092



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -983,7 +983,28 @@ public static boolean 
isLoadInProgress(AbsoluteTableIdentifier absoluteTableIden
 }
   }
 
-  private static boolean isLoadDeletionRequired(LoadMetadataDetails[] details) 
{
+  public static boolean isExpiredSegment(LoadMetadataDetails oneLoad, 
AbsoluteTableIdentifier
+  absoluteTableIdentifier) {
+boolean result = false;
+if (oneLoad.getSegmentStatus() == SegmentStatus.COMPACTED || 
oneLoad.getSegmentStatus() ==
+SegmentStatus.MARKED_FOR_DELETE) {
+  return true;

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578981989



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -983,7 +983,28 @@ public static boolean 
isLoadInProgress(AbsoluteTableIdentifier absoluteTableIden
 }
   }
 
-  private static boolean isLoadDeletionRequired(LoadMetadataDetails[] details) 
{
+  public static boolean isExpiredSegment(LoadMetadataDetails oneLoad, 
AbsoluteTableIdentifier
+  absoluteTableIdentifier) {
+boolean result = false;
+if (oneLoad.getSegmentStatus() == SegmentStatus.COMPACTED || 
oneLoad.getSegmentStatus() ==
+SegmentStatus.MARKED_FOR_DELETE) {
+  return true;
+} else if (oneLoad.getSegmentStatus() == SegmentStatus.INSERT_IN_PROGRESS 
|| oneLoad
+.getSegmentStatus() == SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS) {
+  // check if lock can be acquired
+  ICarbonLock segmentLock = 
CarbonLockFactory.getCarbonLockObj(absoluteTableIdentifier,
+  CarbonTablePath.addSegmentPrefix(oneLoad.getLoadName()) + 
LockUsage.LOCK);
+  if (segmentLock.lockWithRetries()) {
+result = true;
+segmentLock.unlock();

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578974314



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -152,46 +153,77 @@ public static void copyFilesToTrash(List 
filesToCopy,
* The below method deletes timestamp subdirectories in the trash folder 
which have expired as
* per the user defined retention time
*/
-  public static void deleteExpiredDataFromTrash(String tablePath) {
+  public static long[] deleteExpiredDataFromTrash(String tablePath, Boolean 
isDryRun) {
 CarbonFile trashFolder = FileFactory.getCarbonFile(CarbonTablePath
 .getTrashFolderPath(tablePath));
+long sizeFreed = 0;
+long trashFolderSize = 0;
 // Deleting the timestamp based subdirectories in the trashfolder by the 
given timestamp.
 try {
   if (trashFolder.isFileExist()) {
+trashFolderSize = 
FileFactory.getDirectorySize(trashFolder.getAbsolutePath());
 CarbonFile[] timestampFolderList = trashFolder.listFiles();
+List filesToDelete = new ArrayList<>();
 for (CarbonFile timestampFolder : timestampFolderList) {
   // If the timeStamp at which the timeStamp subdirectory has expired 
as per the user
   // defined value, delete the complete timeStamp subdirectory
-  if (timestampFolder.isDirectory() && 
isTrashRetentionTimeoutExceeded(Long
-  .parseLong(timestampFolder.getName( {
-FileFactory.deleteAllCarbonFilesOfDir(timestampFolder);
-LOGGER.info("Timestamp subfolder from the Trash folder deleted: " 
+ timestampFolder
+  if 
(isTrashRetentionTimeoutExceeded(Long.parseLong(timestampFolder.getName( {
+if (timestampFolder.isDirectory()) {
+  sizeFreed += 
FileFactory.getDirectorySize(timestampFolder.getAbsolutePath());
+  filesToDelete.add(timestampFolder);
+}
+  }
+}
+if (!isDryRun) {
+  for (CarbonFile carbonFile : filesToDelete) {
+LOGGER.info("Timestamp subfolder from the Trash folder deleted: " 
+ carbonFile
 .getAbsolutePath());
+FileFactory.deleteAllCarbonFilesOfDir(carbonFile);
   }
 }
   }
 } catch (IOException e) {
   LOGGER.error("Error during deleting expired timestamp folder from the 
trash folder", e);
 }
+return new long[] {sizeFreed, trashFolderSize - sizeFreed};
   }
 
   /**
* The below method deletes all the files and folders in the trash folder of 
a carbon table.
+   * Returns an array in which the first element contains the size freed in 
case of clean files
+   * operation or size that can be freed in case of dry run and the second 
element contains the
+   * remaining size.
*/
-  public static void emptyTrash(String tablePath) {
+  public static long[] emptyTrash(String tablePath, Boolean isDryRun) {
 CarbonFile trashFolder = FileFactory.getCarbonFile(CarbonTablePath
 .getTrashFolderPath(tablePath));
 // if the trash folder exists delete the contents of the trash folder
+long sizeFreed = 0;
+long[] sizeStatistics = new long[]{0, 0};
 try {
   if (trashFolder.isFileExist()) {
 CarbonFile[] carbonFileList = trashFolder.listFiles();
+List filesToDelete = new ArrayList<>();
 for (CarbonFile carbonFile : carbonFileList) {
-  FileFactory.deleteAllCarbonFilesOfDir(carbonFile);
+  sizeFreed += 
FileFactory.getDirectorySize(carbonFile.getAbsolutePath());

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4133) Concurrent Insert Overwrite with static partition on Index server fails

2021-02-18 Thread SHREELEKHYA GAMPA (Jira)
SHREELEKHYA GAMPA created CARBONDATA-4133:
-

 Summary: Concurrent Insert Overwrite with static partition on 
Index server fails
 Key: CARBONDATA-4133
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4133
 Project: CarbonData
  Issue Type: Bug
Reporter: SHREELEKHYA GAMPA


[Steps] :-

with Index Server running execute the concurrent insert overwrite with static 
partition. 

 

Set 0:
CREATE TABLE if not exists uniqdata_string(CUST_ID int,CUST_NAME String,DOB 
timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10),DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) PARTITIONED BY(ACTIVE_EMUI_VERSION string) STORED AS carbondata 
TBLPROPERTIES ('TABLE_BLOCKSIZE'= '256 MB');

Set 1:
LOAD DATA INPATH 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into table 
uniqdata_string partition(active_emui_version='abc') 
OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');
LOAD DATA INPATH 'hdfs://hacluster/datasets/2000_UniqData.csv' into table 
uniqdata_string partition(active_emui_version='abc') 
OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');

Set 2:
CREATE TABLE if not exists uniqdata_hive (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 
int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data local inpath "/opt/csv/2000_UniqData.csv" into table uniqdata_hive;

Set 3: (concurrent)
insert overwrite table uniqdata_string partition(active_emui_version='abc') 
select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, 
decimal_column1, decimal_column2,double_column1, double_column2,integer_column1 
from uniqdata_hive limit 10;
insert overwrite table uniqdata_string partition(active_emui_version='abc') 
select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, 
decimal_column1, decimal_column2,double_column1, double_column2,integer_column1 
from uniqdata_hive limit 10;

[Expected Result] :- Insert should be success for timestamp data in Hive Carbon 
partition table

 

[Actual Issue] : - Concurrent Insert Overwrite with static partition on Index 
server fails

[!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578967973



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -152,46 +153,77 @@ public static void copyFilesToTrash(List 
filesToCopy,
* The below method deletes timestamp subdirectories in the trash folder 
which have expired as
* per the user defined retention time
*/
-  public static void deleteExpiredDataFromTrash(String tablePath) {
+  public static long[] deleteExpiredDataFromTrash(String tablePath, Boolean 
isDryRun) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578967247



##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -152,46 +153,77 @@ public static void copyFilesToTrash(List 
filesToCopy,
* The below method deletes timestamp subdirectories in the trash folder 
which have expired as
* per the user defined retention time
*/
-  public static void deleteExpiredDataFromTrash(String tablePath) {
+  public static long[] deleteExpiredDataFromTrash(String tablePath, Boolean 
isDryRun) {
 CarbonFile trashFolder = FileFactory.getCarbonFile(CarbonTablePath
 .getTrashFolderPath(tablePath));
+long sizeFreed = 0;
+long trashFolderSize = 0;
 // Deleting the timestamp based subdirectories in the trashfolder by the 
given timestamp.
 try {
   if (trashFolder.isFileExist()) {
+trashFolderSize = 
FileFactory.getDirectorySize(trashFolder.getAbsolutePath());

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578963097



##
File path: docs/clean-files.md
##
@@ -64,4 +64,40 @@ The stale_inprogress option with force option will delete 
Marked for delete, Com
 
   ```
   CLEAN FILES FOR TABLE TABLE_NAME options('stale_inprogress'='true', 
'force'='true')
-  ```
\ No newline at end of file
+  ```
+### DRY RUN OPTION
+Clean files also support a dry run option which will let the user know how 
much space fill we freed 
+during the actual clean files operation. The dry run operation will not delete 
any data but will just give
+size bases statistics to the data. Dry run operation will return two columns 
where the first will 
+show how much space will be freed by that clean files operation and the second 
column will show the 
+remaining stale data(data which can be deleted but has not yet expired as per 
the ```max.query.execution.time``` and ``` carbon.trash.retention.days``` values
+).  By default the value of ```dryrun``` option is ```false```.
+
+Dry Run Operation is supported with four types of commands:
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME options('dryrun'='true')
+  ```
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME options('force'='true', 'dryrun'='true')
+  ```
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME 
options('stale_inprogress'='true','dryrun'='true')
+  ```
+
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME options('stale_inprogress'='true', 
'force'='true','dryrun'='true')
+  ```
+
+**NOTE**:
+  * Since the dry run operation will calculate size and will access File level 
API's, the operation can
+  be a costly and a time consuming operation in case of tables with large 
number of segments.

Review comment:
   done

##
File path: docs/clean-files.md
##
@@ -64,4 +64,40 @@ The stale_inprogress option with force option will delete 
Marked for delete, Com
 
   ```
   CLEAN FILES FOR TABLE TABLE_NAME options('stale_inprogress'='true', 
'force'='true')
-  ```
\ No newline at end of file
+  ```
+### DRY RUN OPTION
+Clean files also support a dry run option which will let the user know how 
much space fill we freed 
+during the actual clean files operation. The dry run operation will not delete 
any data but will just give
+size bases statistics to the data. Dry run operation will return two columns 
where the first will 

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578962180



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -87,13 +108,70 @@ object DataTrashManager {
 }
   }
 
-  private def checkAndCleanTrashFolder(carbonTable: CarbonTable, 
isForceDelete: Boolean): Unit = {
+  /**
+   * Checks the size of the segment files as well as datafiles, this method is 
used before and after
+   * clean files operation to check how much space is actually freed, during 
the operation.
+   */
+  def getSizeScreenshot(carbonTable: CarbonTable): Long = {
+val segmentPath = 
CarbonTablePath.getSegmentFilesLocation(carbonTable.getTablePath)
+var size : Long = 0
+if (!carbonTable.isHivePartitionTable) {
+  if (carbonTable.getTableInfo.getFactTable.getTableProperties.containsKey(
+  CarbonCommonConstants.FLAT_FOLDER)) {
+// the size is table size + segment folder size - (metadata folder 
size + lockFiles size)
+(FileFactory.getDirectorySize(carbonTable.getTablePath) + 
FileFactory.getDirectorySize(

Review comment:
   done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##
@@ -56,7 +56,6 @@ class CleanFilesPostEventListener extends 
OperationEventListener with Logging {
   cleanFilesPostEvent.carbonTable,
   cleanFilesPostEvent.options.getOrElse("force", "false").toBoolean,
   cleanFilesPostEvent.options.getOrElse("stale_inprogress", 
"false").toBoolean)
-

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


akashrn5 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r578921513



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1135,13 +1135,16 @@ public static void deleteSegment(String tablePath, 
Segment segment,
 Map> indexFilesMap = fileStore.getIndexFilesMap();
 for (Map.Entry> entry : indexFilesMap.entrySet()) {
   FileFactory.deleteFile(entry.getKey());
+  LOGGER.info("File deleted after clean files operation: " + 
entry.getKey());
   for (String file : entry.getValue()) {
 String[] deltaFilePaths =
 updateStatusManager.getDeleteDeltaFilePath(file, 
segment.getSegmentNo());
 for (String deltaFilePath : deltaFilePaths) {
   FileFactory.deleteFile(deltaFilePath);
+  LOGGER.info("File deleted after clean files operation: " + 
deltaFilePath);

Review comment:
   instead of logging each file name for every delete which will increase 
these logs when many delta files are there, once the loop completes, you can 
log once for all files in line 1147, along with actual block path or else you 
can say delete the the block file(print block file name) and the corresponding 
delta files as filestamp will same i guess. Just check once and add

##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1135,13 +1135,16 @@ public static void deleteSegment(String tablePath, 
Segment segment,
 Map> indexFilesMap = fileStore.getIndexFilesMap();
 for (Map.Entry> entry : indexFilesMap.entrySet()) {
   FileFactory.deleteFile(entry.getKey());
+  LOGGER.info("File deleted after clean files operation: " + 
entry.getKey());

Review comment:
   ```suggestion
 LOGGER.info("Deleted  file: " + entry.getKey() +  ", on clean files" );
   ```
   
   can do same for all 

##
File path: core/src/main/java/org/apache/carbondata/core/util/TrashUtil.java
##
@@ -152,46 +153,77 @@ public static void copyFilesToTrash(List 
filesToCopy,
* The below method deletes timestamp subdirectories in the trash folder 
which have expired as
* per the user defined retention time
*/
-  public static void deleteExpiredDataFromTrash(String tablePath) {
+  public static long[] deleteExpiredDataFromTrash(String tablePath, Boolean 
isDryRun) {

Review comment:
   update the method comment based on method signature changes and return 
values, follow the same for other also if changed

##
File path: docs/clean-files.md
##
@@ -64,4 +64,40 @@ The stale_inprogress option with force option will delete 
Marked for delete, Com
 
   ```
   CLEAN FILES FOR TABLE TABLE_NAME options('stale_inprogress'='true', 
'force'='true')
-  ```
\ No newline at end of file
+  ```
+### DRY RUN OPTION
+Clean files also support a dry run option which will let the user know how 
much space fill we freed 
+during the actual clean files operation. The dry run operation will not delete 
any data but will just give
+size bases statistics to the data. Dry run operation will return two columns 
where the first will 
+show how much space will be freed by that clean files operation and the second 
column will show the 
+remaining stale data(data which can be deleted but has not yet expired as per 
the ```max.query.execution.time``` and ``` carbon.trash.retention.days``` values
+).  By default the value of ```dryrun``` option is ```false```.
+
+Dry Run Operation is supported with four types of commands:
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME options('dryrun'='true')
+  ```
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME options('force'='true', 'dryrun'='true')
+  ```
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME 
options('stale_inprogress'='true','dryrun'='true')
+  ```
+
+  ```
+  CLEAN FILES FOR TABLE TABLE_NAME options('stale_inprogress'='true', 
'force'='true','dryrun'='true')
+  ```
+
+**NOTE**:
+  * Since the dry run operation will calculate size and will access File level 
API's, the operation can
+  be a costly and a time consuming operation in case of tables with large 
number of segments.

Review comment:
   here better to add point of when dry run is true, other options doesnt 
matter except force = true, and i hope you have handled this in code

##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -983,7 +983,28 @@ public static boolean 
isLoadInProgress(AbsoluteTableIdentifier absoluteTableIden
 }
   }
 
-  private static boolean isLoadDeletionRequired(LoadMetadataDetails[] details) 
{
+  public static boolean isExpiredSegment(LoadMetadataDetails oneLoad, 
AbsoluteTableIdentifier
+  absoluteTableIdentifier) {
+boolean result = false;

Review comment:
   rename to `isExpiredSegment`

##
File path: 

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID

2021-02-18 Thread GitBox


Indhumathi27 commented on a change in pull request #4086:
URL: https://github.com/apache/carbondata/pull/4086#discussion_r578944568



##
File path: 
integration/spark/src/test/scala/org/apache/spark/util/CarbonCommandSuite.scala
##
@@ -82,6 +83,43 @@ class CarbonCommandSuite extends QueryTest with 
BeforeAndAfterAll {
""".stripMargin)
   }
 
+  protected def createTestTable(tableName: String): Unit = {
+sql(
+  s"""

Review comment:
   can you add in 
TestLoadGeneral/TestLoadDataWithAutoLoadMerge/InsertIntoCarbonTableTestCase. 
For partition - StandardPartitionTableLoadingTestCase





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4081: [WIP]Secondary Index based pruning without spark query plan modification

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4081:
URL: https://github.com/apache/carbondata/pull/4081#issuecomment-781565780


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3326/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-781565586


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3328/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4086:
URL: https://github.com/apache/carbondata/pull/4086#issuecomment-781563381


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5082/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4081: [WIP]Secondary Index based pruning without spark query plan modification

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4081:
URL: https://github.com/apache/carbondata/pull/4081#issuecomment-781563045


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5083/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-781562499


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5084/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-781561697


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3327/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4086: [CARBONDATA-4115] Successful load and insert will return segment ID

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4086:
URL: https://github.com/apache/carbondata/pull/4086#issuecomment-781561235


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3325/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4088:
URL: https://github.com/apache/carbondata/pull/4088#issuecomment-781561196


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5081/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-781560963


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5086/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4088:
URL: https://github.com/apache/carbondata/pull/4088#issuecomment-781558153


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3324/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4121) Prepriming is not working in index server

2021-02-18 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4121.
--
Fix Version/s: (was: 2.0.1)
   2.1.1
   Resolution: Fixed

> Prepriming is not working in index server
> -
>
> Key: CARBONDATA-4121
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4121
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Prepriming is always executed in a async thread. Server.getRemoteUser in a 
> async thread causes NPE, which crashes the index server application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.

2021-02-18 Thread GitBox


asfgit closed pull request #4088:
URL: https://github.com/apache/carbondata/pull/4088


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.

2021-02-18 Thread GitBox


kunal642 commented on pull request #4088:
URL: https://github.com/apache/carbondata/pull/4088#issuecomment-781519825


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4045: ci_test

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4045:
URL: https://github.com/apache/carbondata/pull/4045#issuecomment-781504936


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5085/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4126) Concurrent Compaction fails with Load on table with SI

2021-02-18 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4126.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Concurrent Compaction fails with Load on table with SI
> --
>
> Key: CARBONDATA-4126
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4126
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.1.0
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> [Steps] :-
> Create table, load data and create SI.
> create table brinjal (imei string,AMSize string,channelsId 
> string,ActiveCountry string, Activecity string,gamePointId 
> double,deviceInformationId double,productionDate Timestamp,deliveryDate 
> timestamp,deliverycharge double) stored as carbondata 
> TBLPROPERTIES('table_blocksize'='1');
> LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE 
> brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
> '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
> 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
> create index indextable1 ON TABLE brinjal (AMSize) AS 'carbondata';
>  
> From one terminal load data to table and other terminal perform minor and 
> major compaction on the table concurrently for some time.
> LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE 
> brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
> '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
> 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
> alter table brinjal compact 'minor';
> alter table brinjal compact 'major';
>  
> [Expected Result] :-  Concurrent Compaction should be success with Load on 
> table with SI
>  
> [Actual Issue] : - Concurrent Compaction fails with Load on table with SI
> *0: jdbc:hive2://linux-32:22550/> alter table brinjal compact 'major';*
> *Error: org.apache.spark.sql.AnalysisException: Compaction failed. Please 
> check logs for more info. Exception in compaction Failed to acquire lock on 
> segment 2, during compaction of table test.brinjal; (state=,code=0)*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-18 Thread GitBox


asfgit closed pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-18 Thread GitBox


kunal642 commented on pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093#issuecomment-781419513


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4132) Numer of records not matching in MVs

2021-02-18 Thread suyash yadav (Jira)
suyash yadav created CARBONDATA-4132:


 Summary: Numer of records not matching in MVs
 Key: CARBONDATA-4132
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4132
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Affects Versions: 2.0.1
 Environment: Apache carbondata 2.0.1
Reporter: suyash yadav
 Fix For: 2.0.1


Hi Team, 

We are working on a POC where we need to insert 300k records/second in a table 
where we have already created Timeeries MVs with Minute,Hour,Day granularity.

 

As per our the Minute based MV should contain 300K records till the insertion 
of next minute data. Also the hour and Day based MVs should contain 300K 
records till the arrival of next hour and next day data respectively.

 

But The count of records in MV is not coming out as per our expectation.It is 
always more than our expectation.

But the strange thing is, When we drop the MV and create the MV after inserting 
the data in the table then the count if reocrds comes correct.So it is clear 
there is no problem with MV definition and the data.

 

Kindly help us in resolving this issue on priority.Please find more details 
below:

Table definition:

===

spark.sql("create table Flow_Raw_TS(export_ms bigint,exporter_ip 
string,pkt_seq_num bigint,flow_seq_num int,src_ip string,dst_ip 
string,protocol_id smallint,src_tos smallint,dst_tos smallint,raw_src_tos 
smallint,raw_dst_tos smallint,src_mask smallint,dst_mask smallint,tcp_bits 
int,src_port int,in_if_id bigint,in_if_entity_id bigint,in_if_enabled 
boolean,dst_port int,out_if_id bigint,out_if_entity_id bigint,out_if_enabled 
boolean,direction smallint,in_octets bigint,out_octets bigint,in_packets 
bigint,out_packets bigint,next_hop_ip string,bgp_src_as_num 
bigint,bgp_dst_as_num bigint,bgp_next_hop_ip string,end_ms timestamp,start_ms 
timestamp,app_id string,app_name string,src_ip_group string,dst_ip_group 
string,policy_qos_classification_hierarchy string,policy_qos_queue_id 
bigint,worker_id int,day bigint ) stored as carbondata TBLPROPERTIES 
('local_dictionary_enable'='false')



MV definition:

 

==

+*Minute based*+

spark.sql("create materialized view Flow_Raw_TS_agg_001_min as select 
timeseries(end_ms,'minute') as 
end_ms,src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,dst_ip_group,protocol_id,bgp_src_as_num,
 bgp_dst_as_num,policy_qos_classification_hierarchy, 
policy_qos_queue_id,sum(in_octets) as octects, sum(in_packets) as packets, 
sum(out_packets) as out_packets, sum(out_octets) as out_octects FROM 
Flow_Raw_TS group by 
timeseries(end_ms,'minute'),src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,
 
dst_ip_group,protocol_id,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy,
 policy_qos_queue_id").show()

+*Hour Based*+

val startTime = System.nanoTime
spark.sql("create materialized view Flow_Raw_TS_agg_001_hour as select 
timeseries(end_ms,'hour') as end_ms,app_name,sum(in_octets) as octects, 
sum(in_packets) as packets, sum(out_packets) as out_packets, sum(out_octets) as 
out_octects, in_if_id,src_tos,src_ip_group, dst_ip_group,protocol_id,src_ip, 
dst_ip,bgp_src_as_num, bgp_dst_as_num,policy_qos_classification_hierarchy, 
policy_qos_queue_id FROM Flow_Raw_TS group by 
timeseries(end_ms,'hour'),in_if_id,app_name,src_tos,src_ip_group,dst_ip_group,protocol_id,src_ip,
 dst_ip,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy, 
policy_qos_queue_id").show()
val endTime = System.nanoTime
val elapsedSeconds = (endTime - startTime) / 1e9d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-781286159


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5494/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-781284062


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3730/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-781209057


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5492/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-18 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-781208770


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3729/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org