Re: [DISCUSSION] Update the function of show segments
Yeah. agree with ravi. We can keep both "Show segments" and "Show extended segment" . @xuchuanyin, as i know currently the result of show segment is formatted. Regards. Chenerlu. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
[DISCUSSION] Update the function of show segments
Hi, dev Currently, I am thinking about the function of show segments. We can see segments of carbon table by executing this command, but it can only return segmentId, status, load start time and load end time, and all this information is from tablestatus, which I think it may be not enough for users to know better about the situation of each segment, so now I want to add two parameters, one is the number of carbon data file under segment folder, another is the number of carbon index file under segment folder. Any suggestion about my idea ? Welcome to communicate. Regards. Chenerlu. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
[DISCUSSION] Unify the sort column and sort scope in create table command
1 Requirement Currently, Users can specify sort column in table properties when create table. And when load data, users can also specify sort scope in load options. In order to improve the ease of use for users, it will be better to specify the sort related parameters all in create table command. Once sort scope is specified in create table command, it will be used in load data even users have specified in load options. 2 Detailed design 2.1 Task-01 Requirement: Create table can support specify sort scope Implement: Take use of table properties (Map), will specify sort scope in table properties by key/value pair, then existing interface will be called to write this key/value pair into metastore. Will support Global Sort,Local Sort and No Sort,it can be specified in sql command: CREATE TABLE tableWithGlobalSort ( shortField SHORT, intField INT, bigintField LONG, doubleField DOUBLE, stringField STRING, timestampField TIMESTAMP, decimalField DECIMAL(18,2), dateField DATE, charField CHAR(5) ) STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT') Tips:If the sort scope is global Sort, users should specify GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number of map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is [1,Integer.MaxValue],it is only used when the sort scope is global sort. Global Sort Use orderby operator in spark, data is ordered in segment level. Local Sort Node ordered, carbondata file is ordered if it is written by one task. No Sort No sort Tips:key and value is case-insensitive. 2.2 Task-02 Requirement: Load data in will support local sort, no sort, global sort Ignore the sort scope specified in load data and use the parameter which specified in create table. Currently, user can specify the sort scope and global sort partitions in load options, After modification, it will ignore the sort scope which specified in load options and will get sort scope from table properties. Current logic: sort scope is from load options Number PrerequisiteSort scope 1 isSortTable is true && Sort Scope is Global SortGlobal Sort(first check) 2 isSortTable is falseNo Sort 3 isSortTable is true Local Sort Tips: isSortTable is true means this table contains sort column or it contains dimensions (except complex type), like string type. For example: Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ — sort table Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ — not sort table Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties (‘sort_column’=’col1’) –- sort table New logic:sort scope is from create table Number PrerequisiteCode branch 1 isSortTable = true && Sort Scope is Global Sort Global Sort(first check) 2 isSortTable= false || Sort Scope is No Sort No Sort 3 isSortTable is true && Sort Scope is Local Sort Local Sort 4 isSortTable is true,without specify Sort Scope Local Sort, (Keep current logic) 3 Acceptance standard Number Acceptance standard 1 Use can specify sort scope(global, local, no sort) when create carbon table in sql type 2 Load data will ignore the sort scope specified in load options and will use the parameter which specify in create table command. If user still specify the sort scope in load options, will give warning and inform user that he will use the sort scope which specified in create table. Here is my JIRA: https://issues.apache.org/jira/browse/CARBONDATA-1438 You can see my simple design above. But I am indecisive about two options when load data with sort scope specified. Option1: Same as the design document, just ignore the sort scope specified in load options and give warning message, use the sort scope specified in create table command, if create table without sort scope, it stilll never use the the sort scope specified in load options. Option2: The sort scope in create table command is in higher priority than the the sort scope specified in load options, which means if create table without sort scope, it will use the sort scope specified in load options. Any idea about this two options ? Regards. Chenerlu. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline
Expect the conference to be held !! Regards. Chenerlu -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Apache-CarbonData-6th-meetup-in-Shanghai-on-2nd-Sep-2017-at-https-jinshuju-net-f-X8x5S9-from-timeline-tp20693p20731.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [DISCUSSION] About data backward compatibility
Agree with caolu, I think users may be confused by lots of format. In the future, it will be better for carbon to unify the data format. The unified format should compatible with previous format. If it is unavoidable to give different format to support different use case to gain better performance, I think we can add configuration parameter in this unified format. The key point is CarbonData should have only one format. It will be better for user to understand and also better for developers to extend. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-About-data-backward-compatibility-tp20183p20423.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: problem with branch-1.1
The key point is the version of spark and carbondata should match. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/problem-with-branch-1-1-tp16004p18107.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [VOTE] Apache CarbonData 1.1.1(RC1) release
+1 Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/VOTE-Apache-CarbonData-1-1-1-RC1-release-tp17531p17715.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [DISCUSSION] Propose to remove "support spark 1.5" from CarbonData 1.2.0 onwards
I think it is ok to support spark1.5 without IUD currently. If users upgrade spark version to spark2.1 or spark2.2 in the future, We can remove this if less user use spark1.5. Is there any special reason which lead to this remove ? Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Propose-to-remove-support-spark-1-5-from-CarbonData-1-2-0-onwards-tp17662p17714.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [Discussion] CarbonOutputFormat Implementation
Hi Divya Thanks for your suggestion. Carbondata may support it in the near future. If you want to contribute this feature, I think it will benefit community a lot. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-CarbonOutputFormat-Implementation-tp17113p17214.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: Integrate Document Checker
Hi Jatin, Agree with you. Carbondata community need such useful tools to increase the quality of document. Thanks. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Integrate-Document-Checker-tp17125p17180.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: Difference in decimal values for variance in Presto.
Hi OK, thanks very much. If you find something wrong in carbondata, we can discuss here. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Difference-in-decimal-values-for-variance-in-Presto-tp16496p17178.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: problem with branch-1.1
Hi, Please try mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 -Phadoop-2.7.2 with hadoop2.7.2 and spark 2. I have just tested, it ok to compile. [INFO] Reactor Summary: [INFO] [INFO] Apache CarbonData :: Parent SUCCESS [ 1.657 s] [INFO] Apache CarbonData :: Common SUCCESS [ 1.870 s] [INFO] Apache CarbonData :: Core .. SUCCESS [ 25.003 s] [INFO] Apache CarbonData :: Processing SUCCESS [ 1.941 s] [INFO] Apache CarbonData :: Hadoop SUCCESS [ 2.017 s] [INFO] Apache CarbonData :: Spark Common .. SUCCESS [ 20.622 s] [INFO] Apache CarbonData :: Spark2 SUCCESS [ 39.956 s] [INFO] Apache CarbonData :: Spark Common Test . SUCCESS [ 4.024 s] [INFO] Apache CarbonData :: Assembly .. SUCCESS [ 3.400 s] [INFO] Apache CarbonData :: Spark2 Examples ... SUCCESS [ 9.718 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:50 min [INFO] Finished at: 2017-06-23T17:55:28+08:00 [INFO] Final Memory: 83M/860M [INFO] bogon:carbondata erlu$ git branch * branch-1.1 Regrads. Chenerlu -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/problem-with-branch-1-1-tp16004p16016.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [DISCUSSION] Whether Carbondata should keep carbon-spark-shell script
Hi community Any comments on this topic ? If others have no idea, I will raise a PR to remove this feature. Regards Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-tp14077p15455.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: can't apply mappartitions to dataframe generated from carboncontext
Hi can you share me your test steps for reproducing this issue ? I mean completed test steps. Thanks. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/can-t-apply-mappartitions-to-dataframe-generated-from-carboncontext-tp14565p15426.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: 答复: can't apply mappartitions to dataframe generated from carboncontext
Hi I think you can debug in windows by adding some debug parameter when start spark-shell in linux. This is what called remote debug. I tried this method when I use windows, hope my idea can help you. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/can-t-apply-mappartitions-to-dataframe-generated-from-carboncontext-tp14565p15369.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: Some questions about compiling carbondata
Hi For Question one, I have raised discussion about carbon-spark-shell for spark2.x in following link. http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-td14077.html Actually there is PR to fix carbon-spark-shell for spark2.x, but I think this script is useless. You can give your opinion, we can vote for this. Regards. Chenerlu -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Some-questions-about-compiling-carbondata-tp4498p15139.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: can't apply mappartitions to dataframe generated from carboncontext
Hi, Mic sun Can you ping your error message directly ? It seems I can't get access to your appendix. Thanks in advance. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/can-t-apply-mappartitions-to-dataframe-generated-from-carboncontext-tp14565p14570.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [DISCUSSION] Whether Carbondata should support Spark-2.2 in the next release version(1.2.0)
Hi, xm_zzc I support carbondata 1.2.0 + spark2.1, because spark 2.2 may not be stable yet if it has been just released. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-support-Spark-2-2-in-the-next-release-version-1-2-0-tp14332p14456.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [DISCUSSION] Whether Carbondata should keep carbon-sql-shell script
Hi, Ravindra. users can learn how to use carbondata through QUICK START document. users should know how it works and this script just simply steps to get a existing CarbonSession. This is carbon API usage, I think community will send much time on maintenance this script which will do more harm than good. Now carbon-spark-shel has some problem when integrate spark2.1. What you said is suitable for carbon-spark-sql, rather than carbon-spark-shell. Because it provide a method of SQL command usage. Regards, Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-tp14077p14217.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [DISCUSSION] Whether Carbondata should keep carbon-sql-shell script
Thanks for correct my mistake. Yes, Just carbon-spark-shell, I think carbon-spark-sql is more helpful than carbon-spark-shell, because it providing a method to interact with carbondata via sql command, rather than carbon api. Based on what I mentioned above, I think carbondata can still keep carbon-spark-sql. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-tp14077p14087.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
[DISCUSSION] Whether Carbondata should keep carbon-sql-shell script
Hi community, Recently, I viewed the implementation of carbon-sql-shell and tried to understand the function of this script. This script just wrap some steps and provide existing CarbonContext or CarbonSession for users to interact with Carbondata. I hold my opinion that we can remove this script because this script is useless except providing a existing CarbonContext or CarbonSession. Reasons as below: 1. Carbondata now has integration spark1.x and spark2.x, Carbondata should refactor carbon-spark-shell every time when spark update. 2. After run this script, it will generate redundant folders in project and user may forget to remove these folders. 3. The CarbonContext or CarbonSession may be created with store path and metastore path which user may not want. I just share my idea about this, we can discuss about whether we should keep this script. Thanks. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-sql-shell-script-tp14077.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: [jira] [Created] (CARBONDATA-1114) Failed to run tests in windows env
Hi, xuchuanyin I think lots of failed test cases may caused by the reason that window path is different from linux path. I have tested in my MAC with local mode. All test cases you mentioned are passed. Regards. Chenerlu. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/jira-Created-CARBONDATA-1114-Failed-to-run-tests-in-windows-env-tp13531p13644.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.