[jira] [Commented] (CARBONDATA-4041) carbondata-processing's apache-spark versions and vulnerabilities

2020-12-13 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248781#comment-17248781
 ] 

Ajantha Bhat commented on CARBONDATA-4041:
--

[https://spark.apache.org/security.html]

 

Based on this  *CVE-2020-9480* is only problematic for standalone cluster RPC 
calls. can you provide more details about article mentioning dependency 
spark-unsafe 2.4.5 has CVE.

 

If unsafe jar has problem, may be explicitly we have to add dependency of 2.4.6 
and exclude 2.4.5.But not sure about compatibility. 

 

Upgrading whole spark to 3.X, we have the plan. It might take few months from 
now to finish integration.

> carbondata-processing's apache-spark versions and vulnerabilities
> -
>
> Key: CARBONDATA-4041
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4041
> Project: CarbonData
>  Issue Type: Improvement
>  Components: other
>Affects Versions: 2.0.1
>Reporter: openlookeng
>Priority: Blocker
>
> carbondata-processing  dependency spark-unsafe 2.4.5 component, but have 
> vulnerabilities of *CVE-2020-9480* , do team have plan to update it ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4040) Data mismatch incase of compaction failure and retry success

2020-12-13 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4040.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

https://github.com/apache/carbondata/pull/3999

> Data mismatch incase of compaction failure and retry success
> 
>
> Key: CARBONDATA-4040
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4040
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> For compaction we don't register inprogress segment. so, when unable to get 
> table status lock. compaction can fail. That time compaction partial segment 
> need to be cleaned. If the partial segment is failed to cleanup due to unable 
> to get lock or IO issues. When the user retries the compaction. carbon uses 
> same segment id. so while writing the segment file for new compaction. list 
> only the files mapping to the current compaction, not all the files which 
> contains stale files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4030) Concurrent SI global sort cannot be success

2020-12-13 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248777#comment-17248777
 ] 

Ajantha Bhat commented on CARBONDATA-4030:
--

https://github.com/apache/carbondata/pull/3949

> Concurrent SI global sort cannot be success
> ---
>
> Key: CARBONDATA-4030
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4030
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> when concurrent SI global sort is in progress, one load was removing the 
> table property added by the other load. So, the global sort insert for one 
> load was failing with error that unable to find position id in the projection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4030) Concurrent SI global sort cannot be success

2020-12-13 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4030.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Concurrent SI global sort cannot be success
> ---
>
> Key: CARBONDATA-4030
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4030
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> when concurrent SI global sort is in progress, one load was removing the 
> table property added by the other load. So, the global sort insert for one 
> load was failing with error that unable to find position id in the projection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] Karan980 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment

2020-12-13 Thread GitBox


Karan980 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r542153838



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql(s"drop table $tableName")
   }
 
+  test("Test clean files on segments after compaction") {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED, 
"true")
+createCarbonTable()
+sql(
+  s"""LOAD DATA local inpath '$resourcesPath/data.csv'
+ | INTO TABLE addsegment1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
'"')""".stripMargin)
+val table = CarbonEnv.getCarbonTable(None, "addsegment1") 
(sqlContext.sparkSession)
+val path = CarbonTablePath.getSegmentPath(table.getTablePath, "1")
+val newPath = storeLocation + "/" + "addsegtest"
+for (i <- 0 until 6) {

Review comment:
   I don't understand the problem with this approach.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on a change in pull request #4048: [CARBONDATA-4072] Clean files command is not deleting .segment files for the segments added through alter table add segment

2020-12-13 Thread GitBox


Karan980 commented on a change in pull request #4048:
URL: https://github.com/apache/carbondata/pull/4048#discussion_r542119343



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1125,19 +1125,21 @@ public static void deleteSegment(String tablePath, 
Segment segment,
 SegmentFileStore fileStore = new SegmentFileStore(tablePath, 
segment.getSegmentFileName());
 List indexOrMergeFiles = 
fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
 FileFactory.getConfiguration());
-Map> indexFilesMap = fileStore.getIndexFilesMap();

Review comment:
   I think this check is fine. Because for setting the isRelative variable 
same logic is used in FolderDetails class. If i use isRelative variable here, i 
have to generate the keys for locationMap because this variable is present 
inside loactionMap.

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##
@@ -962,6 +962,88 @@ class AddSegmentTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql(s"drop table $tableName")
   }
 

Review comment:
   TestCases moved to testCleanFilesCommand and also added testCases for 
partition table and mixed formats.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3917) The rows of data loading is not accurate, more rows has been loaded

2020-12-13 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3917.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

[https://github.com/apache/carbondata/pull/3943]

[https://github.com/apache/carbondata/pull/3967]

> The rows of data loading is not accurate, more rows has been loaded
> ---
>
> Key: CARBONDATA-3917
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3917
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Taoli
>Priority: Blocker
> Fix For: 2.2.0
>
>
> 2020-07-18 18:46:23,856 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Data Writer: 1277745 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Sort Processor: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,856 | DEBUG | 
> [LocalFolderDeletionPool:detail_cdr_s1mme_18461_1595087183856] | 
> PrivilegedAction as:omm (auth:SIMPLE) 
> from:org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:298)
>  | 
> org.apache.hadoop.security.UserGroupInformation.logPrivilegedAction(UserGroupInformation.java:1756)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Data Converter: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Input Processor: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [CARBONDATA-4083] Refactor Update and Support Update Atomicity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744118725


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3392/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [CARBONDATA-4083] Refactor Update and Support Update Atomicity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744118569


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5154/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [CARBONDATA-4083] Refactor Update and Support Update Atomicity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744069608


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3391/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [CARBONDATA-4083] Refactor Update and Support Update Atomicity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744069509


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5153/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4083) Refactor Update and Support Update Atomicity

2020-12-13 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-4083:
---

 Summary: Refactor Update and Support Update Atomicity
 Key: CARBONDATA-4083
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4083
 Project: CarbonData
  Issue Type: Improvement
Reporter: Xingjun Hao


Currently, we will modify tablestatus file for serveral times in the update 
flow. In total 4 tablestauts write ops destoy the Atomicity to a certain 
extent. which maybe incur dirty data under update failure scenrios.

The first time we update tablestatus is when writing delta files, firstly we 
update the updatedeltastarttime and updatedeltaendtime in the tablestatus, then 
delete some segments, which bring 2 tablestatus write ops.



The second time we update tatblstatus is when insert new data. just like the 
first time, will bring 2 tablesatus write ops.

Also, auto compaction doesn't work for UPDATE. UPDATE won't trigger MINOR 
Compaction even when we TURN ON carbon.merge.auto.compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744054016


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5152/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744053885


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3390/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744052048


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5151/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744051932


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3389/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744050601


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3388/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744049122


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5150/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744030809


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3387/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-744030624


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5149/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-743978025


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3386/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update actomity

2020-12-13 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-743977796


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5148/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org