[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586560551 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2008/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586558926 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2005/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586558231 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2004/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586556854 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/304/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586555694 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/303/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow URL: https://github.com/apache/carbondata/pull/3622#issuecomment-586554100 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2003/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586553862 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/301/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586553571 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/300/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586553316 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2006/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586553206 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/302/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command URL: https://github.com/apache/carbondata/pull/3615#issuecomment-586552805 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2002/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (CARBONDATA-3689) Support independent MV Extension for Spark
[ https://issues.apache.org/jira/browse/CARBONDATA-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li closed CARBONDATA-3689. Resolution: Not A Problem > Support independent MV Extension for Spark > -- > > Key: CARBONDATA-3689 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3689 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > To better promote Materialized View usage, we can make Materialized View as > an independent extension for Apache Spark -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] jackylk closed pull request #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
jackylk closed pull request #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow URL: https://github.com/apache/carbondata/pull/3622#issuecomment-586551417 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/299/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (CARBONDATA-3703) Support insert stage in parallel
Xingjun Hao created CARBONDATA-3703: --- Summary: Support insert stage in parallel Key: CARBONDATA-3703 URL: https://issues.apache.org/jira/browse/CARBONDATA-3703 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] marchpure opened a new pull request #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow
marchpure opened a new pull request #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow URL: https://github.com/apache/carbondata/pull/3622 ### Why is this PR needed? Now, Cleaning temp index files in the mergeindex flow takes a lot of time, sometimes it will take 2~3 mins, which should be optimized ### What changes were proposed in this PR? Clean temp index files in parallel in merge index flow ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command URL: https://github.com/apache/carbondata/pull/3615#issuecomment-586549542 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/298/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (CARBONDATA-3702) Clean temp index files in parallel in merge index flow
Xingjun Hao created CARBONDATA-3702: --- Summary: Clean temp index files in parallel in merge index flow Key: CARBONDATA-3702 URL: https://issues.apache.org/jira/browse/CARBONDATA-3702 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao Now, Cleaning temp index files merge index flow takes a lot of time, sometimes it will take 2~3 mins, which should be optimized -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] QiangCai commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow
QiangCai commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586535639 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag…
CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag… URL: https://github.com/apache/carbondata/pull/3621#issuecomment-586432483 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2000/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586429154 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2001/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586398107 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/297/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag…
CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag… URL: https://github.com/apache/carbondata/pull/3621#issuecomment-586395416 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/296/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi… URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586385984 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1998/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure opened a new pull request #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag…
marchpure opened a new pull request #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag… URL: https://github.com/apache/carbondata/pull/3621 …e flow ### Why is this PR needed? Flink writes files to obs with setting the filename in order of time, when loading stages files to carbondata, files are list and loaded with the parameter "maxcount" which will return "maxcount" files with smaller filename, in the other words, "maxcount" files with earliest generation time. but in same senario, it is possible that the filename of stage files is not in order of time, a switch is used here to judge whether to list files with specify batch size, or just list all files in the stage dir. ### What changes were proposed in this PR? A swith "CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME" is added. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586383567 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1999/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586352268 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/295/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi… URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586351917 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/294/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi… URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586341516 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi… URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586339554 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1997/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi… URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586338942 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/293/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586336996 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1995/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure opened a new pull request #3620: [CARBONDATA-3700] Optimize prune performance when prunning with multi…
marchpure opened a new pull request #3620: [CARBONDATA-3700] Optimize prune performance when prunning with multi… URL: https://github.com/apache/carbondata/pull/3620 …-threads Why is this PR needed? When pruning with multi-threads, there is a bug hambers the prunning performance heavily. When the pruning results in no blocklets to map the query filter, The getExtendblocklet function will be triggered to get the extend blocklet metadata, when the Input of this function is an empty blocklet list, this function is expected to return an empty extendblocklet list directyly , but now there is a bug leading to "a hashset add operation" overhead which is meaningless. Meanwhile, When pruning with multi-threads, the getExtendblocklet function will be triggerd for each blocklet, which should be avoided by triggerring this function for each segment. What changes were proposed in this PR? 1) if the input is an empty blocklet list in the getExtendblocklet function, we return an empty extendblocklet list directyly 2) We trigger the getExtendblocklet functon for each segment instead of each blocklet. Does this PR introduce any user interface change? No. Is any new testcase added? Yes. ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586303517 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1994/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586302525 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/291/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586298946 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1996/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586298614 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/292/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379436732 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java ## @@ -290,4 +301,71 @@ public boolean isTimeSeries() { public void setTimeSeries(boolean timeSeries) { isTimeSeries = timeSeries; } + + public boolean supportIncrementalBuild() { +String prop = getProperties().get(DataMapProperty.FULL_REFRESH); +return prop == null || prop.equalsIgnoreCase("false"); + } + + public String getPropertiesAsString() { +String[] properties = getProperties().entrySet().stream() +// ignore internal used property +.filter(p -> +!p.getKey().equalsIgnoreCase(DataMapProperty.DEFERRED_REBUILD) && +!p.getKey().equalsIgnoreCase(DataMapProperty.CHILD_SELECT_QUERY) && Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379436265 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/ShowMaterializedViewCommand.scala ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension.command + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference} +import org.apache.spark.sql.execution.command.{Checker, DataCommand} +import org.apache.spark.sql.types.{BooleanType, StringType} + +import org.apache.carbondata.core.datamap.DataMapStoreManager +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema +import org.apache.carbondata.mv.extension.MVDataMapProvider + +/** + * Show Materialized View Command implementation + * + */ +case class ShowMaterializedViewCommand(tableIdentifier: Option[TableIdentifier]) + extends DataCommand { + + override def output: Seq[Attribute] = { +Seq( + AttributeReference("Name", StringType, nullable = false)(), + AttributeReference("Associated Table", StringType, nullable = false)(), + AttributeReference("Refresh", StringType, nullable = false)(), + AttributeReference("Incremental", BooleanType, nullable = false)(), + AttributeReference("Properties", StringType, nullable = false)(), + AttributeReference("Status", StringType, nullable = false)(), + AttributeReference("Sync Info", StringType, nullable = false)()) + } + + override def processData(sparkSession: SparkSession): Seq[Row] = { +convertToRow(getAllMVSchema(sparkSession)) + } + + /** + * get all datamaps for this table, including preagg, index datamaps and mv Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379435462 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/CreateMaterializedViewCommand.scala ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.mv.extension.command + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql._ +import org.apache.spark.sql.execution.command._ + +import org.apache.carbondata.common.exceptions.sql.MalformedMaterializedViewException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datamap.{DataMapProvider, DataMapStoreManager} +import org.apache.carbondata.core.datamap.status.DataMapStatusManager +import org.apache.carbondata.core.metadata.schema.datamap.DataMapProperty +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.datamap.DataMapManager +import org.apache.carbondata.events._ +import org.apache.carbondata.mv.extension.MVDataMapProvider + +/** + * Create Materialized View Command implementation + * It will create the MV table, load the MV table (if deferred rebuild is false), + * and register the MV schema in [[DataMapStoreManager]] + */ +case class CreateMaterializedViewCommand( +mvName: String, +properties: Map[String, String], +queryString: Option[String], +ifNotExistsSet: Boolean = false, +deferredRebuild: Boolean = false) + extends AtomicRunnableCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + private var dataMapProvider: DataMapProvider = _ + private var dataMapSchema: DataMapSchema = _ + + override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + +setAuditInfo(Map("mvName" -> mvName) ++ properties) + +dataMapSchema = new DataMapSchema(mvName, MVDataMapProvider.MV_PROVIDER_NAME) +val property = properties.map(x => (x._1.trim, x._2.trim)).asJava +val javaMap = new java.util.HashMap[String, String](property) +javaMap.put(DataMapProperty.DEFERRED_REBUILD, deferredRebuild.toString) +dataMapSchema.setProperties(javaMap) + +dataMapProvider = DataMapManager.get.getDataMapProvider(null, dataMapSchema, sparkSession) +if (DataMapStoreManager.getInstance().getAllDataMapSchemas.asScala + .exists(_.getDataMapName.equalsIgnoreCase(dataMapSchema.getDataMapName))) { + if (!ifNotExistsSet) { +throw new MalformedMaterializedViewException( Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379435075 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java ## @@ -290,4 +301,71 @@ public boolean isTimeSeries() { public void setTimeSeries(boolean timeSeries) { isTimeSeries = timeSeries; } + + public boolean supportIncrementalBuild() { +String prop = getProperties().get(DataMapProperty.FULL_REFRESH); +return prop == null || prop.equalsIgnoreCase("false"); + } + + public String getPropertiesAsString() { +String[] properties = getProperties().entrySet().stream() +// ignore internal used property +.filter(p -> +!p.getKey().equalsIgnoreCase(DataMapProperty.DEFERRED_REBUILD) && +!p.getKey().equalsIgnoreCase(DataMapProperty.CHILD_SELECT_QUERY) && +!p.getKey().equalsIgnoreCase(DataMapProperty.QUERY_TYPE) && +!p.getKey().equalsIgnoreCase(DataMapProperty.FULL_REFRESH)) +.map(p -> "'" + p.getKey() + "'='" + p.getValue() + "'") +.sorted() +.toArray(String[]::new); +return Strings.mkString(properties, ","); + } + + public String getTable() { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379435221 ## File path: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVAnalyzerRule.scala ## @@ -56,7 +56,7 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { // first check if any mv UDF is applied it is present is in plan // then call is from create MV so no need to transform the query plan // TODO Add different UDF name - case al@Alias(udf: ScalaUDF, name) if name.equalsIgnoreCase(CarbonEnv.MV_SKIP_RULE_UDF) => + case al@Alias(udf: ScalaUDF, name) if name.equalsIgnoreCase(MVUdf.MV_SKIP_RULE_UDF) => Review comment: I think MV should be capital This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379434538 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java ## @@ -290,4 +301,71 @@ public boolean isTimeSeries() { public void setTimeSeries(boolean timeSeries) { isTimeSeries = timeSeries; } + + public boolean supportIncrementalBuild() { +String prop = getProperties().get(DataMapProperty.FULL_REFRESH); +return prop == null || prop.equalsIgnoreCase("false"); + } + + public String getPropertiesAsString() { +String[] properties = getProperties().entrySet().stream() +// ignore internal used property +.filter(p -> +!p.getKey().equalsIgnoreCase(DataMapProperty.DEFERRED_REBUILD) && +!p.getKey().equalsIgnoreCase(DataMapProperty.CHILD_SELECT_QUERY) && +!p.getKey().equalsIgnoreCase(DataMapProperty.QUERY_TYPE) && +!p.getKey().equalsIgnoreCase(DataMapProperty.FULL_REFRESH)) +.map(p -> "'" + p.getKey() + "'='" + p.getValue() + "'") +.sorted() +.toArray(String[]::new); +return Strings.mkString(properties, ","); + } + + public String getTable() { +return relationIdentifier.getDatabaseName() + "." + relationIdentifier.getTableName(); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379429897 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java ## @@ -290,4 +301,71 @@ public boolean isTimeSeries() { public void setTimeSeries(boolean timeSeries) { isTimeSeries = timeSeries; } + + public boolean supportIncrementalBuild() { Review comment: renamed to `canBeIncrementalBuild` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command URL: https://github.com/apache/carbondata/pull/3612#discussion_r379429200 ## File path: common/src/main/java/org/apache/carbondata/common/exceptions/sql/MalformedMaterializedViewException.java ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.common.exceptions.sql; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.annotations.InterfaceStability; + +/** + * This exception will be thrown when MV related SQL statement is invalid + */ +@InterfaceAudience.User +@InterfaceStability.Stable +public class MalformedMaterializedViewException extends MalformedCarbonCommandException { + /** Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586280885 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1993/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586273872 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/290/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408696 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/compression/NoneCompressor.java ## @@ -0,0 +1,51 @@ +package org.apache.carbondata.core.datastore.compression; + +import java.io.IOException; + +public class NoneCompressor extends AbstractCompressor { Review comment: You can add description to this class that it does not perform any compression This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408450 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/compression/NoneCompressor.java ## @@ -0,0 +1,51 @@ +package org.apache.carbondata.core.datastore.compression; Review comment: please add license header as other source file This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408170 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java ## @@ -35,6 +35,7 @@ private final Map allSupportedCompressors = new HashMap<>(); public enum NativeSupportedCompressor { +NONE("none",NoneCompressor.class), Review comment: add space after `,` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586253092 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/289/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586245352 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1989/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow
ajantha-bhat commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586219611 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586218935 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1988/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379376668 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -182,9 +184,14 @@ public static String genSegmentFileName(String segmentId, String UUID) { * @param UUID a UUID string used to construct the segment file name * @return segment file name Review comment: add one more parameter `segmentMinMaxList` in comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378987 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala ## @@ -58,6 +61,9 @@ object UpdateDataLoad { loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS) val executor = new DataLoadExecutor TaskContext.get().addTaskCompletionListener { context => +accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap. + asScala.mapValues(_.asScala.toList).toMap) +SegmentMinMaxStats.getInstance().clear() Review comment: same as above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379341045 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.indexstore; + +import java.util.List; +import java.util.Set; + +import org.apache.carbondata.core.util.SegmentMinMax; + +public class SegmentBlockInfo { Review comment: i think you can rename class as `SegmentBlockIndexInfo` or `SegmentIndexInfo` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264536 ## File path: core/src/main/java/org/apache/carbondata/core/datamap/Segment.java ## @@ -85,6 +86,8 @@ */ private transient Map options; + private List segmentMinMax; Review comment: add comment what it stores This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379374364 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -140,9 +149,66 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, segmentMap.put(segment.getSegmentNo(), segment); Set identifiers = getTableBlockIndexUniqueIdentifiers(segment); - // get tableBlockIndexUniqueIdentifierWrappers from segment file info - getTableBlockUniqueIdentifierWrappers(partitionsToPrune, - tableBlockIndexUniqueIdentifierWrappers, identifiers); + if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) { +// get tableBlockIndexUniqueIdentifierWrappers from segment file info +getTableBlockUniqueIdentifierWrappers(partitionsToPrune, +tableBlockIndexUniqueIdentifierWrappers, identifiers); + } else { +List segmentMinMaxList = segment.getSegmentMinMax(); +//boolean isLoadAllIndex = Boolean.parseBoolean(CarbonProperties.getInstance() +// .getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE, +// CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT)); +if (null != segmentMinMaxList && !filter.isEmpty() && null != filter && null == FilterUtil +.getImplicitFilterExpression(filter.getExpression())) { + boolean isScanRequired = false; + for (SegmentMinMax segmentMinMax : segmentMinMaxList) { +Map minValues = segmentMinMax.getMinValues(); +Map maxValues = segmentMinMax.getMaxValues(); +int length = minValues.size(); +List columnSchemas = new ArrayList<>(); Review comment: why this one more list is required This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379343378 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, return dataMaps; } + private void getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter, + List tableBlockIndexUniqueIdentifierWrappers, + Segment segment, Set identifiers) { +List segmentMinMaxList = segment.getSegmentMinMax(); +//boolean isLoadAllIndex = Boolean.parseBoolean(CarbonProperties.getInstance() +// .getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE, +// CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT)); +if (null != segmentMinMaxList && !filter.isEmpty()) { Review comment: i think for filter, you should have null check instead of empty check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379304868 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.indexstore; + +import java.util.List; +import java.util.Set; + +import org.apache.carbondata.core.util.SegmentMinMax; + +public class SegmentBlockInfo { Review comment: give class level and variable level comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379339875 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -211,15 +292,22 @@ private void getTableBlockUniqueIdentifierWrappers(List partition public Set getTableBlockIndexUniqueIdentifiers(Segment segment) Review comment: replace this method with this and verify once SegmentBlockInfo segmentBlockInfo = segmentMap.get(segment.getSegmentNo()); Set tableBlockIndexUniqueIdentifiers; if (null != segmentBlockInfo) { segment.setSegmentMinMax(segmentMap.get(segment.getSegmentNo()).getSegmentMinMax()); tableBlockIndexUniqueIdentifiers = segmentBlockInfo.getTableBlockIndexUniqueIdentifiers(); } else { tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); if (tableBlockIndexUniqueIdentifiers.size() > 0) { segmentMap.put(segment.getSegmentNo(), new SegmentBlockInfo(tableBlockIndexUniqueIdentifiers, segment.getSegmentMinMax())); } } return tableBlockIndexUniqueIdentifiers; This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264459 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + Review comment: add comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378160 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds list of block level min max for each segment + */ +public class SegmentMinMaxStats { + + private SegmentMinMaxStats() { + } + + public static SegmentMinMaxStats getInstance() { +return segmentMinMaxStats; + } + + private Map> segmentMinMaxMap = new HashMap<>(); + + private static final SegmentMinMaxStats segmentMinMaxStats = new SegmentMinMaxStats(); + + public Map> getSegmentMinMaxMap() { +return segmentMinMaxMap; + } + + public void setSegmentMinMaxList(String segmentId, Map minValues, + Map maxValues) { +if (this.segmentMinMaxMap.get(segmentId) == null) { + List segmentMinMaxList = new ArrayList<>(); + segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues)); + this.segmentMinMaxMap.put(segmentId, segmentMinMaxList); +} else { + this.segmentMinMaxMap.get(segmentId) + .add(new SegmentMinMax(minValues, maxValues)); +} + } + + public void clear() { Review comment: this method can be removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377714 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -1265,6 +1292,21 @@ void addPath(String path, FolderDetails details) { public void setOptions(Map options) { this.options = options; } + +public List getSegmentMinMax() { + List segmentMinMaxList = null; + try { +segmentMinMaxList = +(List) ObjectSerializationUtil.convertStringToObject(segmentMinMax); + } catch (IOException e) { +LOGGER.error("Error while getting segment minmax"); + } + return segmentMinMaxList; Review comment: i recommend to return empty list instead of null and all the place remove null check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377880 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMax.java ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.Serializable; +import java.util.Map; + +/** + * Holds Min, Max and columnCardinality values for each segment block + */ +public class SegmentMinMax implements Serializable { + + /** + * Map of column names and it's block level min values + */ + private Map minValues; + + /** + * Map of column names and it's block level max values + */ + private Map maxValues; + + SegmentMinMax(Map minValues, Map maxValues) { +this.minValues = minValues; +this.maxValues = maxValues; + } + + public Map getMinValues() { +return minValues; + } + + public void setMinValues(Map minValues) { +this.minValues = minValues; + } + + public Map getMaxValues() { +return maxValues; + } + + public void setMaxValues(Map maxValues) { +this.maxValues = maxValues; + } +} Review comment: addnew line at end of class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264433 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + + public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE = "carbon.load.all.indexes.to.cache"; + + public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT = "true"; Review comment: add comment when to set false This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378744 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/InsertTaskCompletionListener.scala ## @@ -17,19 +17,27 @@ package org.apache.carbondata.spark.rdd +import scala.collection.JavaConverters._ + import org.apache.spark.TaskContext import org.apache.spark.sql.carbondata.execution.datasources.tasklisteners.CarbonLoadTaskCompletionListener import org.apache.spark.sql.execution.command.ExecutionErrors +import org.apache.spark.util.CollectionAccumulator -import org.apache.carbondata.core.util.{DataTypeUtil, ThreadLocalTaskInfo} +import org.apache.carbondata.core.util.{DataTypeUtil, SegmentMinMax, SegmentMinMaxStats, ThreadLocalTaskInfo} import org.apache.carbondata.processing.loading.{DataLoadExecutor, FailureCauses} import org.apache.carbondata.spark.util.CommonUtil class InsertTaskCompletionListener(dataLoadExecutor: DataLoadExecutor, -executorErrors: ExecutionErrors) +executorErrors: ExecutionErrors, +accumulator: CollectionAccumulator[Map[String, List[SegmentMinMax]]]) extends CarbonLoadTaskCompletionListener { override def onTaskCompletion(context: TaskContext): Unit = { try { + // add segment level minMax to accumulator + accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap. +asScala.mapValues(_.asScala.toList).toMap) + SegmentMinMaxStats.getInstance().clear() Review comment: can just be map.clear This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379317644 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds list of block level min max for each segment + */ +public class SegmentMinMaxStats { + + private SegmentMinMaxStats() { + } + + public static SegmentMinMaxStats getInstance() { Review comment: in getInstance, create new object and map, only once and then reuse for each load, just clear the map entries once filled in accumulator. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378567 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds list of block level min max for each segment + */ +public class SegmentMinMaxStats { + + private SegmentMinMaxStats() { + } + + public static SegmentMinMaxStats getInstance() { +return segmentMinMaxStats; + } + + private Map> segmentMinMaxMap = new HashMap<>(); + + private static final SegmentMinMaxStats segmentMinMaxStats = new SegmentMinMaxStats(); + + public Map> getSegmentMinMaxMap() { +return segmentMinMaxMap; + } + + public void setSegmentMinMaxList(String segmentId, Map minValues, + Map maxValues) { +if (this.segmentMinMaxMap.get(segmentId) == null) { + List segmentMinMaxList = new ArrayList<>(); + segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues)); + this.segmentMinMaxMap.put(segmentId, segmentMinMaxList); +} else { + this.segmentMinMaxMap.get(segmentId) Review comment: in if check already `this.segmentMinMaxMap.get(segmentId)` is null, so here put the segment id in map, nullpointer can come This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379346187 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, return dataMaps; } + private void getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter, Review comment: better `getTableBlockIndexUniqueIdentifierUsingSegmentPruning` rename method name to `getTableBlockIndexUniqueIdentifierUsingSegmentMinMax` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379379224 ## File path: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ## @@ -315,7 +316,10 @@ object CarbonDataRDDFactory { val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable var status: Array[(String, (LoadMetadataDetails, ExecutionErrors))] = null var res: Array[List[(String, (LoadMetadataDetails, ExecutionErrors))]] = null - +// accumulator to collect segment minmax +val minMaxAccumulator = sqlContext Review comment: rename to `segmentMinMaxAccumulator` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586215511 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1990/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586207985 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1992/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586207774 Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/288/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #1363: [CARBONDATA-1378] support creating table in hive
CarbonDataQA1 commented on issue #1363: [CARBONDATA-1378] support creating table in hive URL: https://github.com/apache/carbondata/pull/1363#issuecomment-586204474 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1991/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3611#discussion_r379335042 ## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala ## @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with } } + test("test current none compressor on legacy store with snappy") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy") +createTable() +loadData() + + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none") +loadData() +checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16))) Review comment: Sure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] asfgit closed pull request #3619: [HOTFIX] Remove unused fields in TableInfo
asfgit closed pull request #3619: [HOTFIX] Remove unused fields in TableInfo URL: https://github.com/apache/carbondata/pull/3619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3611#issuecomment-586201467 > @Pickupolddriver : Agree that it can improve the loading speed. But data will be 3x bigger. So, storage cost on OBS will be 3x more! Data would be processed after loaded to OBS. So if we could provide a NonCompressor, it could avoid the data being compressed and then uncompressed. And the uncompressed data would be deleted after processed in OBS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo
akashrn5 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo URL: https://github.com/apache/carbondata/pull/3619#issuecomment-586199051 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586197154 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/286/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586188469 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/285/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data. URL: https://github.com/apache/carbondata/pull/3611#discussion_r379335042 ## File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala ## @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with } } + test("test current none compressor on legacy store with snappy") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy") +createTable() +loadData() + + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none") +loadData() +checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16))) Review comment: Sure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo
CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo URL: https://github.com/apache/carbondata/pull/3619#issuecomment-586179412 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1987/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586175115 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1985/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo
CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo URL: https://github.com/apache/carbondata/pull/3619#issuecomment-586147787 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/284/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586144674 Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/282/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services