[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 @kumarvishal09 1. If user has not mentioned any sort column then it will go to old flow, sorting based on all dimension column 2. yes 3. During dataloading, the start/end key of blocklet info contain only sort columns. 4. For dataloading, just use sort columns to build start/end key of blocklet info. Code line: CarbonFactDataHandlerColumnar.java 1041 For select query, juse use sort columns to bulid start/end key of filters. Code line: FilterUtil.java 1159 and 1206 @ravipesala I have remove date & timestamp datatype from no-dcitonary. Better to raise another pr to implement new numeric datatype encoding. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user CarbonDataQA commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1174/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user CarbonDataQA commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1172/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user CarbonDataQA commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1171/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user CarbonDataQA commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1140/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user CarbonDataQA commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1078/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user kumarvishal09 commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 @QiangCai I have queries related to this PR. 1. If user has not mentioned any sort column then it will go to old flow (sorting based on all dimension column) or data wont be sorted ? 2. If data is not sorted We cannot use B+ tree we need to use some other linear data structure like array or linked list, i have not seen any changes related to this. 2. Btree is created based on sort column, so based on this pr we need to update the btree loading as only sort column will participate on creating the Btree. 3. How you creating start key and end key as only sort column can participate on both the keys. Btree jump will not work if other columns (except sort columns) are participating in start and end key. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 @ravipesala Is it neccessary to limit that the sort_columns should come from dimensions? If the table need be sorted by a measure, we should use dictionary_include to add it to dimension list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 @ravipesala I have listed the tasks. Better to implement another direct-dictionary encoding for numeric datatype column. We can remove the dimension and measure concept, and only use column concept. The encoding of a column will be decided by the datatype of this column and table properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user ravipesala commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 @QiangCai Please mention what are the tasks you are doing in this PR. It is better to stick only supporting sort_columns in this PR. Other tasks can be pushed to other PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 @ravipesala good suggestion. Direct dicitonary is better than no dictioanry. I will add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user CarbonDataQA commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1047/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user ravipesala commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 @QiangCai I have few doubts. Why primitive data types are supported as no-dictionary columns in this PR? It supposed to be direct dictionary. Why date and timestamp are supported in no-dictionary, it already has direct dictionary support and it much efficient in terms of loading and query. I think the scope of this PR should be limited to following points. 1. Support Sort_columns in DDL and metadata. 2. Already in old flow all columns with dictionary_include and dictionary_exclude will become sort_columns and remaining are measures . So now there would not be any measure concept now so we just make sort_columns should have sorted and rowid index, and remaining columns should not be sorted/ row index but it should have value/delta compression if it is number datatype. I feel it would have been better if we have some discussion in mailing list before starting the implementation to keep the people sync with you and it avoids unnecessary rework. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS
Github user CarbonDataQA commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1041/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---