Re: [DISCUSSION] Update the function of show segments

2017-09-22 Thread Erlu Chen
Yeah. agree with ravi.

We can keep both "Show segments"  and "Show extended segment" .

@xuchuanyin, as i know currently the result of show segment is formatted.

Regards.
Chenerlu.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[DISCUSSION] Update the function of show segments

2017-09-16 Thread Erlu Chen
Hi, dev

Currently, I am thinking about the function of show segments. We can see
segments of carbon table by executing this command, but it can only return
segmentId, status, load start time and load end time, and all this
information is from tablestatus, which I think it may be not enough for
users to know better about the situation of each segment, so now I want to
add two parameters, one is the number of carbon data file under segment
folder, another is the number of carbon index file under segment folder.

Any suggestion about my idea ?

Welcome to communicate.

Regards.
Chenerlu.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[DISCUSSION] Unify the sort column and sort scope in create table command

2017-08-31 Thread Erlu Chen
1   Requirement
Currently, Users can specify sort column in table properties when create
table. And when load data, users can also specify sort scope in load
options.
In order to improve the ease of use for users, it will be better to specify
the sort related parameters all in create table command.
Once sort scope is specified in create table command, it will be used in
load data even users have specified in load options.
2   Detailed design
2.1 Task-01
Requirement: Create table can support specify sort scope
Implement: Take use of table properties (Map), will specify
sort scope in table properties by key/value pair, then existing interface
will be called to write this key/value pair into metastore.
Will support Global Sort,Local Sort and No Sort,it can be specified in sql
command:
CREATE TABLE tableWithGlobalSort (
shortField SHORT,
intField INT,
bigintField LONG,
doubleField DOUBLE,
stringField STRING,
timestampField TIMESTAMP,
decimalField DECIMAL(18,2),
dateField DATE,
charField CHAR(5)
)
STORED BY 'carbondata'
TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
Tips:If the sort scope is global Sort, users should specify
GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number
of map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is
[1,Integer.MaxValue],it is only used when the sort scope is global sort.
Global Sort Use orderby operator in spark, data is ordered in segment level.
Local Sort  Node ordered, carbondata file is ordered if it is written by one
task. 
No Sort No sort
Tips:key and value is case-insensitive.
2.2 Task-02
Requirement:
Load data in will support local sort, no sort, global sort 
Ignore the sort scope specified in load data and use the parameter which
specified in create table.
Currently, user can specify the sort scope and global sort partitions in
load options, After modification, it will ignore the sort scope which
specified in load options and will get sort scope from table properties.
Current logic: sort scope is from load options
Number  PrerequisiteSort scope
1   isSortTable is true && Sort Scope is Global SortGlobal 
Sort(first check)
2   isSortTable is falseNo Sort
3   isSortTable is true Local Sort
Tips: isSortTable is true means this table contains sort column or it
contains dimensions (except complex type), like string type.
For example:
Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ — sort table
Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ — not sort
table
Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties
(‘sort_column’=’col1’) –- sort table
New logic:sort scope is from create table
Number  PrerequisiteCode branch
1   isSortTable = true && Sort Scope is Global Sort Global Sort(first check)
2   isSortTable= false || Sort Scope is No Sort No Sort
3   isSortTable is true && Sort Scope is Local Sort Local Sort
4   isSortTable is true,without specify Sort Scope  Local Sort, (Keep 
current
logic)
3   Acceptance standard
Number  Acceptance standard
1   Use can specify sort scope(global, local, no sort) when create carbon
table in sql type
2   Load data will ignore the sort scope specified in load options and will
use the parameter which specify in create table command. If user still
specify the sort scope in load options, will give warning and inform user
that he will use the sort scope which specified in create table.

Here is my JIRA: https://issues.apache.org/jira/browse/CARBONDATA-1438

You can see my simple design above.

But I am indecisive about two options when load data with sort scope
specified.

Option1: Same as the design document, just ignore the sort scope specified
in load options and give warning message, use the sort scope specified in
create table command, if create table without sort scope, it stilll never
use the the sort scope specified in load options.

Option2: The sort scope in create table command is in higher priority than
the the sort scope specified in load options, which means if create table
without sort scope, it will use the sort scope specified in load options.

Any idea about this two options ?

Regards.
Chenerlu.




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline

2017-08-23 Thread Erlu Chen
Expect the conference to be held !!

Regards.
Chenerlu



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Apache-CarbonData-6th-meetup-in-Shanghai-on-2nd-Sep-2017-at-https-jinshuju-net-f-X8x5S9-from-timeline-tp20693p20731.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] About data backward compatibility

2017-08-17 Thread Erlu Chen
Agree with caolu, I think users may be confused by lots of format.

In the future, it will be better for carbon to unify the data format. The
unified format should compatible with previous format. If it is unavoidable
to give different format to support different use case to gain better
performance, I think we can add configuration parameter in this unified
format. 

The key point is CarbonData should have only one format.
It will be better for user to understand and also better for developers to
extend.


Regards.
Chenerlu.





--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-About-data-backward-compatibility-tp20183p20423.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: problem with branch-1.1

2017-07-12 Thread Erlu Chen
The key point is the version of spark and carbondata should match.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/problem-with-branch-1-1-tp16004p18107.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [VOTE] Apache CarbonData 1.1.1(RC1) release

2017-07-09 Thread Erlu Chen
+1

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/VOTE-Apache-CarbonData-1-1-1-RC1-release-tp17531p17715.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] Propose to remove "support spark 1.5" from CarbonData 1.2.0 onwards

2017-07-09 Thread Erlu Chen
I think it is ok to support spark1.5 without IUD currently. 
If users upgrade spark version to spark2.1 or spark2.2 in the future, We can
remove this if less user use spark1.5.
Is there any special reason which lead to this remove ?

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Propose-to-remove-support-spark-1-5-from-CarbonData-1-2-0-onwards-tp17662p17714.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [Discussion] CarbonOutputFormat Implementation

2017-07-04 Thread Erlu Chen
Hi Divya

Thanks for your suggestion.

Carbondata may support it in the near future.

If you want to contribute this feature, I think it will benefit community a
lot.


Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-CarbonOutputFormat-Implementation-tp17113p17214.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: Integrate Document Checker

2017-07-03 Thread Erlu Chen
Hi Jatin,

Agree with you.

Carbondata community need such useful tools to increase the quality of
document.

Thanks.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Integrate-Document-Checker-tp17125p17180.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: Difference in decimal values for variance in Presto.

2017-07-03 Thread Erlu Chen
Hi

OK, thanks very much.

If you find something wrong in carbondata, we can discuss here.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Difference-in-decimal-values-for-variance-in-Presto-tp16496p17178.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: problem with branch-1.1

2017-06-23 Thread Erlu Chen
Hi,

Please try mvn package -DskipTests -Pspark-2.1 -Dspark.version=2.1.0
-Phadoop-2.7.2 with hadoop2.7.2 and spark 2.

I have just tested, it ok to compile.

[INFO] Reactor Summary:
[INFO] 
[INFO] Apache CarbonData :: Parent  SUCCESS [  1.657
s]
[INFO] Apache CarbonData :: Common  SUCCESS [  1.870
s]
[INFO] Apache CarbonData :: Core .. SUCCESS [ 25.003
s]
[INFO] Apache CarbonData :: Processing  SUCCESS [  1.941
s]
[INFO] Apache CarbonData :: Hadoop  SUCCESS [  2.017
s]
[INFO] Apache CarbonData :: Spark Common .. SUCCESS [ 20.622
s]
[INFO] Apache CarbonData :: Spark2  SUCCESS [ 39.956
s]
[INFO] Apache CarbonData :: Spark Common Test . SUCCESS [  4.024
s]
[INFO] Apache CarbonData :: Assembly .. SUCCESS [  3.400
s]
[INFO] Apache CarbonData :: Spark2 Examples ... SUCCESS [  9.718
s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 01:50 min
[INFO] Finished at: 2017-06-23T17:55:28+08:00
[INFO] Final Memory: 83M/860M
[INFO]

bogon:carbondata erlu$ git branch
* branch-1.1

Regrads.
Chenerlu




--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/problem-with-branch-1-1-tp16004p16016.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] Whether Carbondata should keep carbon-spark-shell script

2017-06-19 Thread Erlu Chen
Hi community

Any comments on this topic ?

If others have no idea, I will raise a PR to remove this feature.


Regards
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-tp14077p15455.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: can't apply mappartitions to dataframe generated from carboncontext

2017-06-18 Thread Erlu Chen
Hi

can you share me your test steps for reproducing this issue ?

I mean completed test steps.

Thanks.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/can-t-apply-mappartitions-to-dataframe-generated-from-carboncontext-tp14565p15426.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: 答复: can't apply mappartitions to dataframe generated from carboncontext

2017-06-16 Thread Erlu Chen
Hi

I think you can debug in windows by adding some debug parameter  when start
spark-shell in linux.

This is what called remote debug.

I tried this method when I use windows, hope my idea can help you.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/can-t-apply-mappartitions-to-dataframe-generated-from-carboncontext-tp14565p15369.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: Some questions about compiling carbondata

2017-06-15 Thread Erlu Chen
Hi 

For Question one, I have raised discussion about carbon-spark-shell for
spark2.x in following link.

http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-td14077.html

Actually there is PR to fix carbon-spark-shell for spark2.x, but I think
this script is useless.

You can give your opinion, we can vote for this.

Regards.
Chenerlu 



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Some-questions-about-compiling-carbondata-tp4498p15139.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: can't apply mappartitions to dataframe generated from carboncontext

2017-06-11 Thread Erlu Chen
Hi, Mic sun

Can you ping your error message directly ?

It seems I can't get access to your appendix.


Thanks in advance.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/can-t-apply-mappartitions-to-dataframe-generated-from-carboncontext-tp14565p14570.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] Whether Carbondata should support Spark-2.2 in the next release version(1.2.0)

2017-06-10 Thread Erlu Chen
Hi, xm_zzc

I support carbondata 1.2.0 + spark2.1, because spark 2.2 may not be stable
yet if it has been just released.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-support-Spark-2-2-in-the-next-release-version-1-2-0-tp14332p14456.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] Whether Carbondata should keep carbon-sql-shell script

2017-06-08 Thread Erlu Chen
Hi,  Ravindra.

users can learn how to use carbondata through QUICK START document.

users should know how it works and this script just simply steps to get a
existing CarbonSession.

This is carbon API usage, I think community will send much time on
maintenance this script
which will do more harm than good.

Now carbon-spark-shel has some problem when integrate spark2.1.

What you said is suitable for carbon-spark-sql, rather than
carbon-spark-shell. Because  it provide a method of SQL command usage.


Regards, 
Chenerlu. 



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-tp14077p14217.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] Whether Carbondata should keep carbon-sql-shell script

2017-06-06 Thread Erlu Chen
Thanks for correct my mistake.

Yes, Just carbon-spark-shell, I think carbon-spark-sql is more helpful than
carbon-spark-shell, because it providing a method to interact with
carbondata via sql command, rather than carbon api.

Based on what I mentioned above, I think carbondata can still keep
carbon-spark-sql.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-spark-shell-script-tp14077p14087.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


[DISCUSSION] Whether Carbondata should keep carbon-sql-shell script

2017-06-06 Thread Erlu Chen
Hi community,

Recently, I viewed the implementation of carbon-sql-shell and tried to
understand the function of this script.

This script just wrap some steps and provide existing CarbonContext or
CarbonSession for users  to interact with Carbondata.

I hold my opinion that we can remove this script because this script is
useless except providing a existing CarbonContext or CarbonSession.

Reasons as below:
1. Carbondata now has integration spark1.x and spark2.x, Carbondata should
refactor carbon-spark-shell every time when spark update.
2. After run this script, it will generate redundant folders in project and
user may forget to remove these folders.
3. The CarbonContext or CarbonSession may be created with store path and
metastore path which user may not want.

I just share my idea about this, we can discuss about whether we should keep
this script.

Thanks.

Regards.
Chenerlu.







--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-keep-carbon-sql-shell-script-tp14077.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [jira] [Created] (CARBONDATA-1114) Failed to run tests in windows env

2017-06-02 Thread Erlu Chen
Hi, xuchuanyin 

I think lots of failed test cases may caused by the reason that window path
is different from linux path.

I have tested in my MAC with local mode.

All test cases you mentioned are passed.

Regards.
Chenerlu.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/jira-Created-CARBONDATA-1114-Failed-to-run-tests-in-windows-env-tp13531p13644.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.