[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-15 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
@kumarvishal09 
1. If user has not mentioned any sort column then it will go to old flow, 
sorting based on all dimension column
2. yes
3. During dataloading, the start/end key of blocklet info contain only sort 
columns.
4. For dataloading, just use sort columns to build start/end key of 
blocklet info. 
Code line: CarbonFactDataHandlerColumnar.java 1041
For select query, juse use sort columns to bulid start/end key of 
filters. 
Code line: FilterUtil.java 1159 and 1206

@ravipesala 
I have remove date & timestamp datatype from no-dcitonary.
Better to raise another pr to implement new numeric datatype encoding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-15 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1174/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-15 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1172/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-15 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1171/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-14 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1140/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-10 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1078/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-09 Thread kumarvishal09
Github user kumarvishal09 commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
@QiangCai I have queries related to this PR.
1. If user has not mentioned any sort column then it will go to old flow 
(sorting based on all dimension column) or data wont be sorted ?
2. If data is not sorted We cannot use B+ tree we need to use some other 
linear data structure like array or linked list, i have not seen any changes 
related to this.
2. Btree is created based on sort column, so based on this pr we need to 
update the btree loading as only sort column will participate on creating the 
Btree.
3. How you creating start key and end key as only sort column can 
participate on both the keys. Btree jump will not work if other columns (except 
sort columns) are participating in start and end key.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-08 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
@ravipesala  
Is it neccessary to limit that the sort_columns should come from dimensions?
If the table need be sorted by a measure, we should use dictionary_include 
to add it to dimension list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-08 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
@ravipesala I have listed the tasks. 
Better to implement another direct-dictionary encoding for numeric datatype 
column. We can remove the dimension and measure concept,  and only use column 
concept. The encoding of a column will be decided by the datatype of this 
column and table properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-08 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
@QiangCai Please mention what are the tasks you are doing in this PR. It is 
better to stick only supporting sort_columns in this PR. Other tasks can be 
pushed to other PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-08 Thread QiangCai
Github user QiangCai commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
@ravipesala good suggestion. Direct dicitonary is better than no 
dictioanry. I will add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-08 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1047/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-08 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
@QiangCai I have few doubts.
Why primitive data types are supported as no-dictionary columns in this PR? 
It supposed to be direct dictionary.
Why date and timestamp are supported in no-dictionary, it already has 
direct dictionary support and it much efficient in terms of loading and query.

I think the scope of this PR should be limited to following points.
1. Support Sort_columns in DDL and metadata.
2.  Already in old flow all columns with dictionary_include and 
dictionary_exclude will become sort_columns and remaining are measures . So now 
there would not be any measure concept now so we just make sort_columns should 
have sorted and rowid index, and remaining columns should not be sorted/ row 
index but it should have value/delta compression if it is number datatype.

I feel it would have been better if we have some discussion in mailing list 
before starting the implementation to keep the people sync with you and it 
avoids unnecessary rework.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #635: [WIP]support SORT_COLUMNS

2017-03-08 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/635
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1041/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---