[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort 
performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586560551
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2008/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized 
View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586558926
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2005/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586558231
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2004/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort 
performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586556854
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/304/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort 
performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586555694
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/303/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index 
files in parallel in merge index flow
URL: https://github.com/apache/carbondata/pull/3622#issuecomment-586554100
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2003/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized 
View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586553862
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/301/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586553571
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/300/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort 
performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586553316
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2006/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort performance

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3603: [CARBONDATA-3679] Optimize local sort 
performance
URL: https://github.com/apache/carbondata/pull/3603#issuecomment-586553206
 
 
   Build Failed  with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/302/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert 
flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-586552805
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2002/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (CARBONDATA-3689) Support independent MV Extension for Spark

2020-02-14 Thread Jacky Li (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li closed CARBONDATA-3689.

Resolution: Not A Problem

> Support independent MV Extension for Spark
> --
>
> Key: CARBONDATA-3689
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3689
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> To better promote Materialized View usage, we can make Materialized View as 
> an independent extension for Apache Spark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] jackylk closed pull request #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
jackylk closed pull request #3609: [CARBONDATA-3689] Support independent MV 
extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3622: [CARBONDATA-3702] Clean temp index 
files in parallel in merge index flow
URL: https://github.com/apache/carbondata/pull/3622#issuecomment-586551417
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/299/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3703) Support insert stage in parallel

2020-02-14 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3703:
---

 Summary: Support insert stage in parallel
 Key: CARBONDATA-3703
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3703
 Project: CarbonData
  Issue Type: Improvement
Reporter: Xingjun Hao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] marchpure opened a new pull request #3622: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow

2020-02-14 Thread GitBox
marchpure opened a new pull request #3622: [CARBONDATA-3702] Clean temp index 
files in parallel in merge index flow
URL: https://github.com/apache/carbondata/pull/3622
 
 
### Why is this PR needed?
Now, Cleaning temp index files in the mergeindex flow takes a lot of time, 
sometimes it will take 2~3 mins, which should be optimized
   
### What changes were proposed in this PR?
   Clean temp index files in parallel in merge index flow
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert flow for MV and insert stage command

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3615: [CARBONDATA-3637] Use optimized insert 
flow for MV and insert stage command
URL: https://github.com/apache/carbondata/pull/3615#issuecomment-586549542
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/298/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3702) Clean temp index files in parallel in merge index flow

2020-02-14 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3702:
---

 Summary: Clean temp index files in parallel in merge index flow
 Key: CARBONDATA-3702
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3702
 Project: CarbonData
  Issue Type: Improvement
Reporter: Xingjun Hao


Now, Cleaning temp index files merge index flow takes a lot of time, sometimes 
it will take 2~3 mins, which should be optimized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] QiangCai commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow

2020-02-14 Thread GitBox
QiangCai commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586535639
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag…

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and 
listfile(maxCount) in InsertStag…
URL: https://github.com/apache/carbondata/pull/3621#issuecomment-586432483
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2000/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586429154
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2001/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586398107
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/297/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag…

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3621: [HOTFIX] Support both listfile() and 
listfile(maxCount) in InsertStag…
URL: https://github.com/apache/carbondata/pull/3621#issuecomment-586395416
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/296/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning 
performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586385984
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1998/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] marchpure opened a new pull request #3621: [HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag…

2020-02-14 Thread GitBox
marchpure opened a new pull request #3621: [HOTFIX] Support both listfile() and 
listfile(maxCount) in InsertStag…
URL: https://github.com/apache/carbondata/pull/3621
 
 
   …e flow
   
### Why is this PR needed?
Flink writes files to obs with setting the filename in order of time, when 
loading stages files to carbondata, files are list and loaded with the 
parameter "maxcount" which will return "maxcount" files with smaller filename, 
in the other words, "maxcount" files with earliest generation time. but in same 
senario, it is possible that the filename of stage files is not in order of 
time, a switch is used here to judge whether to list files with specify batch 
size, or just list all files in the stage dir.

### What changes were proposed in this PR?
A swith "CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME" is added.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586383567
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1999/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586352268
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/295/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning 
performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586351917
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/294/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

2020-02-14 Thread GitBox
marchpure commented on issue #3620: [CARBONDATA-3700] Optimize pruning 
performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586341516
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning 
performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586339554
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1997/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning performance when prunning with multi…

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3620: [CARBONDATA-3700] Optimize pruning 
performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620#issuecomment-586338942
 
 
   Build Failed  with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/293/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized 
View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586336996
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1995/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] marchpure opened a new pull request #3620: [CARBONDATA-3700] Optimize prune performance when prunning with multi…

2020-02-14 Thread GitBox
marchpure opened a new pull request #3620: [CARBONDATA-3700] Optimize prune 
performance when prunning with multi…
URL: https://github.com/apache/carbondata/pull/3620
 
 
   …-threads
   
   Why is this PR needed?
   When pruning with multi-threads, there is a bug hambers the prunning 
performance heavily.
   When the pruning results in no blocklets to map the query filter, The 
getExtendblocklet function will be triggered to get the extend blocklet 
metadata, when the Input of this function is an empty blocklet list, this 
function is expected to return an empty extendblocklet list directyly , but now 
there is a bug leading to "a hashset add operation" overhead which is 
meaningless.
   Meanwhile, When pruning with multi-threads, the getExtendblocklet function 
will be triggerd for each blocklet, which should be avoided by triggerring this 
function for each segment.
   
   What changes were proposed in this PR?
   1) if the input is an empty blocklet list in the getExtendblocklet function, 
we return an empty extendblocklet list directyly
   2) We trigger the getExtendblocklet functon for each segment instead of each 
blocklet.
   
   Does this PR introduce any user interface change?
   No.
   
   Is any new testcase added?
   Yes.
   
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586303517
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1994/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3612: [CARBONDATA-3694] Separate Materialized 
View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#issuecomment-586302525
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/291/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586298946
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1996/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586298614
 
 
   Build Failed  with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/292/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379436732
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ##
 @@ -290,4 +301,71 @@ public boolean isTimeSeries() {
   public void setTimeSeries(boolean timeSeries) {
 isTimeSeries = timeSeries;
   }
+
+  public boolean supportIncrementalBuild() {
+String prop = getProperties().get(DataMapProperty.FULL_REFRESH);
+return prop == null || prop.equalsIgnoreCase("false");
+  }
+
+  public String getPropertiesAsString() {
+String[] properties = getProperties().entrySet().stream()
+// ignore internal used property
+.filter(p ->
+!p.getKey().equalsIgnoreCase(DataMapProperty.DEFERRED_REBUILD) &&
+!p.getKey().equalsIgnoreCase(DataMapProperty.CHILD_SELECT_QUERY) &&
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379436265
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/ShowMaterializedViewCommand.scala
 ##
 @@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.mv.extension.command
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
+import org.apache.spark.sql.execution.command.{Checker, DataCommand}
+import org.apache.spark.sql.types.{BooleanType, StringType}
+
+import org.apache.carbondata.core.datamap.DataMapStoreManager
+import org.apache.carbondata.core.metadata.schema.table.DataMapSchema
+import org.apache.carbondata.mv.extension.MVDataMapProvider
+
+/**
+ * Show Materialized View Command implementation
+ *
+ */
+case class ShowMaterializedViewCommand(tableIdentifier: 
Option[TableIdentifier])
+  extends DataCommand {
+
+  override def output: Seq[Attribute] = {
+Seq(
+  AttributeReference("Name", StringType, nullable = false)(),
+  AttributeReference("Associated Table", StringType, nullable = false)(),
+  AttributeReference("Refresh", StringType, nullable = false)(),
+  AttributeReference("Incremental", BooleanType, nullable = false)(),
+  AttributeReference("Properties", StringType, nullable = false)(),
+  AttributeReference("Status", StringType, nullable = false)(),
+  AttributeReference("Sync Info", StringType, nullable = false)())
+  }
+
+  override def processData(sparkSession: SparkSession): Seq[Row] = {
+convertToRow(getAllMVSchema(sparkSession))
+  }
+
+  /**
+   * get all datamaps for this table, including preagg, index datamaps and mv
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379435462
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/command/CreateMaterializedViewCommand.scala
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.mv.extension.command
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.execution.command._
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedMaterializedViewException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.datamap.{DataMapProvider, 
DataMapStoreManager}
+import org.apache.carbondata.core.datamap.status.DataMapStatusManager
+import org.apache.carbondata.core.metadata.schema.datamap.DataMapProperty
+import org.apache.carbondata.core.metadata.schema.table.DataMapSchema
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.datamap.DataMapManager
+import org.apache.carbondata.events._
+import org.apache.carbondata.mv.extension.MVDataMapProvider
+
+/**
+ * Create Materialized View Command implementation
+ * It will create the MV table, load the MV table (if deferred rebuild is 
false),
+ * and register the MV schema in [[DataMapStoreManager]]
+ */
+case class CreateMaterializedViewCommand(
+mvName: String,
+properties: Map[String, String],
+queryString: Option[String],
+ifNotExistsSet: Boolean = false,
+deferredRebuild: Boolean = false)
+  extends AtomicRunnableCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+  private var dataMapProvider: DataMapProvider = _
+  private var dataMapSchema: DataMapSchema = _
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+
+setAuditInfo(Map("mvName" -> mvName) ++ properties)
+
+dataMapSchema = new DataMapSchema(mvName, 
MVDataMapProvider.MV_PROVIDER_NAME)
+val property = properties.map(x => (x._1.trim, x._2.trim)).asJava
+val javaMap = new java.util.HashMap[String, String](property)
+javaMap.put(DataMapProperty.DEFERRED_REBUILD, deferredRebuild.toString)
+dataMapSchema.setProperties(javaMap)
+
+dataMapProvider = DataMapManager.get.getDataMapProvider(null, 
dataMapSchema, sparkSession)
+if (DataMapStoreManager.getInstance().getAllDataMapSchemas.asScala
+  
.exists(_.getDataMapName.equalsIgnoreCase(dataMapSchema.getDataMapName))) {
+  if (!ifNotExistsSet) {
+throw new MalformedMaterializedViewException(
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379435075
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ##
 @@ -290,4 +301,71 @@ public boolean isTimeSeries() {
   public void setTimeSeries(boolean timeSeries) {
 isTimeSeries = timeSeries;
   }
+
+  public boolean supportIncrementalBuild() {
+String prop = getProperties().get(DataMapProperty.FULL_REFRESH);
+return prop == null || prop.equalsIgnoreCase("false");
+  }
+
+  public String getPropertiesAsString() {
+String[] properties = getProperties().entrySet().stream()
+// ignore internal used property
+.filter(p ->
+!p.getKey().equalsIgnoreCase(DataMapProperty.DEFERRED_REBUILD) &&
+!p.getKey().equalsIgnoreCase(DataMapProperty.CHILD_SELECT_QUERY) &&
+!p.getKey().equalsIgnoreCase(DataMapProperty.QUERY_TYPE) &&
+!p.getKey().equalsIgnoreCase(DataMapProperty.FULL_REFRESH))
+.map(p -> "'" + p.getKey() + "'='" + p.getValue() + "'")
+.sorted()
+.toArray(String[]::new);
+return Strings.mkString(properties, ",");
+  }
+
+  public String getTable() {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379435221
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/extension/MVAnalyzerRule.scala
 ##
 @@ -56,7 +56,7 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends 
Rule[LogicalPlan] {
   // first check if any mv UDF is applied it is present is in plan
   // then call is from create MV so no need to transform the query plan
   // TODO Add different UDF name
-  case al@Alias(udf: ScalaUDF, name) if 
name.equalsIgnoreCase(CarbonEnv.MV_SKIP_RULE_UDF) =>
+  case al@Alias(udf: ScalaUDF, name) if 
name.equalsIgnoreCase(MVUdf.MV_SKIP_RULE_UDF) =>
 
 Review comment:
   I think MV should be capital 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379434538
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ##
 @@ -290,4 +301,71 @@ public boolean isTimeSeries() {
   public void setTimeSeries(boolean timeSeries) {
 isTimeSeries = timeSeries;
   }
+
+  public boolean supportIncrementalBuild() {
+String prop = getProperties().get(DataMapProperty.FULL_REFRESH);
+return prop == null || prop.equalsIgnoreCase("false");
+  }
+
+  public String getPropertiesAsString() {
+String[] properties = getProperties().entrySet().stream()
+// ignore internal used property
+.filter(p ->
+!p.getKey().equalsIgnoreCase(DataMapProperty.DEFERRED_REBUILD) &&
+!p.getKey().equalsIgnoreCase(DataMapProperty.CHILD_SELECT_QUERY) &&
+!p.getKey().equalsIgnoreCase(DataMapProperty.QUERY_TYPE) &&
+!p.getKey().equalsIgnoreCase(DataMapProperty.FULL_REFRESH))
+.map(p -> "'" + p.getKey() + "'='" + p.getValue() + "'")
+.sorted()
+.toArray(String[]::new);
+return Strings.mkString(properties, ",");
+  }
+
+  public String getTable() {
+return relationIdentifier.getDatabaseName() + "." + 
relationIdentifier.getTableName();
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379429897
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/DataMapSchema.java
 ##
 @@ -290,4 +301,71 @@ public boolean isTimeSeries() {
   public void setTimeSeries(boolean timeSeries) {
 isTimeSeries = timeSeries;
   }
+
+  public boolean supportIncrementalBuild() {
 
 Review comment:
   renamed to `canBeIncrementalBuild`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate Materialized View command from DataMap command

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3612: [CARBONDATA-3694] Separate 
Materialized View command from DataMap command
URL: https://github.com/apache/carbondata/pull/3612#discussion_r379429200
 
 

 ##
 File path: 
common/src/main/java/org/apache/carbondata/common/exceptions/sql/MalformedMaterializedViewException.java
 ##
 @@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.common.exceptions.sql;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+
+/**
+ * This exception will be thrown when MV related SQL statement is invalid
+ */
+@InterfaceAudience.User
+@InterfaceStability.Stable
+public class MalformedMaterializedViewException extends 
MalformedCarbonCommandException {
+  /**
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into 
flow
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586280885
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1993/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586273872
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/290/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support 
NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408696
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/NoneCompressor.java
 ##
 @@ -0,0 +1,51 @@
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.IOException;
+
+public class NoneCompressor extends AbstractCompressor {
 
 Review comment:
   You can add description to this class that it does not perform any 
compression


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support 
NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408450
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/NoneCompressor.java
 ##
 @@ -0,0 +1,51 @@
+package org.apache.carbondata.core.datastore.compression;
 
 Review comment:
   please add license header as other source file


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

2020-02-14 Thread GitBox
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support 
NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408170
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
 ##
 @@ -35,6 +35,7 @@
   private final Map allSupportedCompressors = new 
HashMap<>();
 
   public enum NativeSupportedCompressor {
+NONE("none",NoneCompressor.class),
 
 Review comment:
   add space after `,`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into 
flow
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586253092
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/289/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586245352
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1989/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ajantha-bhat commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow

2020-02-14 Thread GitBox
ajantha-bhat commented on issue #3538: [CARBONDATA-3637] Optimize insert into 
flow
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586219611
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into 
flow
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586218935
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1988/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379376668
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
 ##
 @@ -182,9 +184,14 @@ public static String genSegmentFileName(String segmentId, 
String UUID) {
* @param UUID  a UUID string used to construct the segment file name
* @return segment file name
 
 Review comment:
   add one more parameter `segmentMinMaxList`  in comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378987
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala
 ##
 @@ -58,6 +61,9 @@ object UpdateDataLoad {
   loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS)
   val executor = new DataLoadExecutor
   TaskContext.get().addTaskCompletionListener { context =>
+accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap.
+  asScala.mapValues(_.asScala.toList).toMap)
+SegmentMinMaxStats.getInstance().clear()
 
 Review comment:
   same as above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379341045
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.indexstore;
+
+import java.util.List;
+import java.util.Set;
+
+import org.apache.carbondata.core.util.SegmentMinMax;
+
+public class SegmentBlockInfo {
 
 Review comment:
   i think you can rename class as `SegmentBlockIndexInfo` or `SegmentIndexInfo`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264536
 
 

 ##
 File path: core/src/main/java/org/apache/carbondata/core/datamap/Segment.java
 ##
 @@ -85,6 +86,8 @@
*/
   private transient Map options;
 
+  private List segmentMinMax;
 
 Review comment:
   add comment what it stores 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379374364
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -140,9 +149,66 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
   segmentMap.put(segment.getSegmentNo(), segment);
   Set identifiers =
   getTableBlockIndexUniqueIdentifiers(segment);
-  // get tableBlockIndexUniqueIdentifierWrappers from segment file info
-  getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
-  tableBlockIndexUniqueIdentifierWrappers, identifiers);
+  if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) {
+// get tableBlockIndexUniqueIdentifierWrappers from segment file info
+getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
+tableBlockIndexUniqueIdentifierWrappers, identifiers);
+  } else {
+List segmentMinMaxList = segment.getSegmentMinMax();
+//boolean isLoadAllIndex = 
Boolean.parseBoolean(CarbonProperties.getInstance()
+//
.getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE,
+//
CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT));
+if (null != segmentMinMaxList && !filter.isEmpty() && null != filter 
&& null == FilterUtil
+.getImplicitFilterExpression(filter.getExpression())) {
+  boolean isScanRequired = false;
+  for (SegmentMinMax segmentMinMax : segmentMinMaxList) {
+Map minValues = segmentMinMax.getMinValues();
+Map maxValues = segmentMinMax.getMaxValues();
+int length = minValues.size();
+List columnSchemas = new ArrayList<>();
 
 Review comment:
   why this one more list is required


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379343378
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
 return dataMaps;
   }
 
+  private void 
getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter,
+  List 
tableBlockIndexUniqueIdentifierWrappers,
+  Segment segment, Set identifiers) {
+List segmentMinMaxList = segment.getSegmentMinMax();
+//boolean isLoadAllIndex = 
Boolean.parseBoolean(CarbonProperties.getInstance()
+//
.getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE,
+//
CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT));
+if (null != segmentMinMaxList && !filter.isEmpty()) {
 
 Review comment:
   i think for filter, you should have null check instead of empty check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379304868
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.indexstore;
+
+import java.util.List;
+import java.util.Set;
+
+import org.apache.carbondata.core.util.SegmentMinMax;
+
+public class SegmentBlockInfo {
 
 Review comment:
   give class level and variable level comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379339875
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -211,15 +292,22 @@ private void 
getTableBlockUniqueIdentifierWrappers(List partition
 
   public Set 
getTableBlockIndexUniqueIdentifiers(Segment segment)
 
 Review comment:
   replace this method with this and verify once
   
SegmentBlockInfo segmentBlockInfo = segmentMap.get(segment.getSegmentNo());
   Set tableBlockIndexUniqueIdentifiers;
   if (null != segmentBlockInfo) {
 
segment.setSegmentMinMax(segmentMap.get(segment.getSegmentNo()).getSegmentMinMax());
 tableBlockIndexUniqueIdentifiers = 
segmentBlockInfo.getTableBlockIndexUniqueIdentifiers();
   } else {
 tableBlockIndexUniqueIdentifiers =
 BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
 if (tableBlockIndexUniqueIdentifiers.size() > 0) {
   segmentMap.put(segment.getSegmentNo(),
   new SegmentBlockInfo(tableBlockIndexUniqueIdentifiers, 
segment.getSegmentMinMax()));
 }
   }
   return tableBlockIndexUniqueIdentifiers;


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264459
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##
 @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() {
* Default first day of week
*/
   public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = 
"SUNDAY";
+
 
 Review comment:
   add comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378160
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Holds list of block level min max for each segment
+ */
+public class SegmentMinMaxStats {
+
+  private SegmentMinMaxStats() {
+  }
+
+  public static SegmentMinMaxStats getInstance() {
+return segmentMinMaxStats;
+  }
+
+  private Map> segmentMinMaxMap = new HashMap<>();
+
+  private static final SegmentMinMaxStats segmentMinMaxStats = new 
SegmentMinMaxStats();
+
+  public Map> getSegmentMinMaxMap() {
+return segmentMinMaxMap;
+  }
+
+  public void setSegmentMinMaxList(String segmentId, Map 
minValues,
+  Map maxValues) {
+if (this.segmentMinMaxMap.get(segmentId) == null) {
+  List segmentMinMaxList = new ArrayList<>();
+  segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues));
+  this.segmentMinMaxMap.put(segmentId, segmentMinMaxList);
+} else {
+  this.segmentMinMaxMap.get(segmentId)
+  .add(new SegmentMinMax(minValues, maxValues));
+}
+  }
+
+  public void clear() {
 
 Review comment:
   this method can be removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377714
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
 ##
 @@ -1265,6 +1292,21 @@ void addPath(String path, FolderDetails details) {
 public void setOptions(Map options) {
   this.options = options;
 }
+
+public List getSegmentMinMax() {
+  List segmentMinMaxList = null;
+  try {
+segmentMinMaxList =
+(List) 
ObjectSerializationUtil.convertStringToObject(segmentMinMax);
+  } catch (IOException e) {
+LOGGER.error("Error while getting segment minmax");
+  }
+  return segmentMinMaxList;
 
 Review comment:
   i recommend to return empty list instead of null and all the place remove 
null check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377880
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMax.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.Serializable;
+import java.util.Map;
+
+/**
+ * Holds Min, Max and columnCardinality values for each segment block
+ */
+public class SegmentMinMax implements Serializable {
+
+  /**
+   * Map of column names and it's block level min values
+   */
+  private Map minValues;
+
+  /**
+   * Map of column names and it's block level max values
+   */
+  private Map maxValues;
+
+  SegmentMinMax(Map minValues, Map maxValues) {
+this.minValues = minValues;
+this.maxValues = maxValues;
+  }
+
+  public Map getMinValues() {
+return minValues;
+  }
+
+  public void setMinValues(Map minValues) {
+this.minValues = minValues;
+  }
+
+  public Map getMaxValues() {
+return maxValues;
+  }
+
+  public void setMaxValues(Map maxValues) {
+this.maxValues = maxValues;
+  }
+}
 
 Review comment:
   addnew line at end of class


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264433
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##
 @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() {
* Default first day of week
*/
   public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = 
"SUNDAY";
+
+  public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE = 
"carbon.load.all.indexes.to.cache";
+
+  public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT = "true";
 
 Review comment:
   add comment when to set false


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378744
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/InsertTaskCompletionListener.scala
 ##
 @@ -17,19 +17,27 @@
 
 package org.apache.carbondata.spark.rdd
 
+import scala.collection.JavaConverters._
+
 import org.apache.spark.TaskContext
 import 
org.apache.spark.sql.carbondata.execution.datasources.tasklisteners.CarbonLoadTaskCompletionListener
 import org.apache.spark.sql.execution.command.ExecutionErrors
+import org.apache.spark.util.CollectionAccumulator
 
-import org.apache.carbondata.core.util.{DataTypeUtil, ThreadLocalTaskInfo}
+import org.apache.carbondata.core.util.{DataTypeUtil, SegmentMinMax, 
SegmentMinMaxStats, ThreadLocalTaskInfo}
 import org.apache.carbondata.processing.loading.{DataLoadExecutor, 
FailureCauses}
 import org.apache.carbondata.spark.util.CommonUtil
 
 class InsertTaskCompletionListener(dataLoadExecutor: DataLoadExecutor,
-executorErrors: ExecutionErrors)
+executorErrors: ExecutionErrors,
+accumulator: CollectionAccumulator[Map[String, List[SegmentMinMax]]])
   extends CarbonLoadTaskCompletionListener {
   override def onTaskCompletion(context: TaskContext): Unit = {
 try {
+  // add segment level minMax to accumulator
+  accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap.
+asScala.mapValues(_.asScala.toList).toMap)
+  SegmentMinMaxStats.getInstance().clear()
 
 Review comment:
   can just be map.clear


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379317644
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Holds list of block level min max for each segment
+ */
+public class SegmentMinMaxStats {
+
+  private SegmentMinMaxStats() {
+  }
+
+  public static SegmentMinMaxStats getInstance() {
 
 Review comment:
   in getInstance, create new object and map, only once and then reuse for each 
load, just clear the map entries once filled in accumulator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378567
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Holds list of block level min max for each segment
+ */
+public class SegmentMinMaxStats {
+
+  private SegmentMinMaxStats() {
+  }
+
+  public static SegmentMinMaxStats getInstance() {
+return segmentMinMaxStats;
+  }
+
+  private Map> segmentMinMaxMap = new HashMap<>();
+
+  private static final SegmentMinMaxStats segmentMinMaxStats = new 
SegmentMinMaxStats();
+
+  public Map> getSegmentMinMaxMap() {
+return segmentMinMaxMap;
+  }
+
+  public void setSegmentMinMaxList(String segmentId, Map 
minValues,
+  Map maxValues) {
+if (this.segmentMinMaxMap.get(segmentId) == null) {
+  List segmentMinMaxList = new ArrayList<>();
+  segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues));
+  this.segmentMinMaxMap.put(segmentId, segmentMinMaxList);
+} else {
+  this.segmentMinMaxMap.get(segmentId)
 
 Review comment:
   in if check already `this.segmentMinMaxMap.get(segmentId)` is null, so here 
put the segment id in map, nullpointer can come


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379346187
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
 return dataMaps;
   }
 
+  private void 
getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter,
 
 Review comment:
   better `getTableBlockIndexUniqueIdentifierUsingSegmentPruning` rename method 
name to `getTableBlockIndexUniqueIdentifierUsingSegmentMinMax`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379379224
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ##
 @@ -315,7 +316,10 @@ object CarbonDataRDDFactory {
 val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
 var status: Array[(String, (LoadMetadataDetails, ExecutionErrors))] = null
 var res: Array[List[(String, (LoadMetadataDetails, ExecutionErrors))]] = 
null
-
+// accumulator to collect segment minmax
+val minMaxAccumulator = sqlContext
 
 Review comment:
   rename to `segmentMinMaxAccumulator`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586215511
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1990/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586207985
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1992/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3584: [WIP] Support SegmentLevel MinMax for 
better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#issuecomment-586207774
 
 
   Build Failed  with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/288/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #1363: [CARBONDATA-1378] support creating table in hive

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #1363: [CARBONDATA-1378] support creating 
table in hive
URL: https://github.com/apache/carbondata/pull/1363#issuecomment-586204474
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1991/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

2020-02-14 Thread GitBox
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] 
Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379335042
 
 

 ##
 File path: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with 
BeforeAndAfterEach with
 }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT,
 "true")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, 
"snappy")
+createTable()
+loadData()
+
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT,
 "true")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, 
"none")
+loadData()
+checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   Sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3619: [HOTFIX] Remove unused fields in TableInfo

2020-02-14 Thread GitBox
asfgit closed pull request #3619: [HOTFIX] Remove unused fields in TableInfo
URL: https://github.com/apache/carbondata/pull/3619
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

2020-02-14 Thread GitBox
Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support 
NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-586201467
 
 
   > @Pickupolddriver : Agree that it can improve the loading speed. But data 
will be 3x bigger. So, storage cost on OBS will be 3x more!
   
   Data would be processed after loaded to OBS. So if we could provide a 
NonCompressor, it could avoid the data being compressed and then uncompressed. 
And the uncompressed data would be deleted after processed in OBS. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo

2020-02-14 Thread GitBox
akashrn5 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo
URL: https://github.com/apache/carbondata/pull/3619#issuecomment-586199051
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586197154
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/286/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into flow

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3538: [CARBONDATA-3637] Optimize insert into 
flow
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-586188469
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/285/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

2020-02-14 Thread GitBox
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] 
Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379335042
 
 

 ##
 File path: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with 
BeforeAndAfterEach with
 }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT,
 "true")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, 
"snappy")
+createTable()
+loadData()
+
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT,
 "true")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, 
"none")
+loadData()
+checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   Sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in 
TableInfo
URL: https://github.com/apache/carbondata/pull/3619#issuecomment-586179412
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1987/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586175115
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1985/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in TableInfo

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3619: [HOTFIX] Remove unused fields in 
TableInfo
URL: https://github.com/apache/carbondata/pull/3619#issuecomment-586147787
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/284/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent MV extension and MV syntax

2020-02-14 Thread GitBox
CarbonDataQA1 commented on issue #3609: [CARBONDATA-3689] Support independent 
MV extension and MV syntax
URL: https://github.com/apache/carbondata/pull/3609#issuecomment-586144674
 
 
   Build Success with Spark 2.4.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/282/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services