Updated release notes . Re: [ANNOUNCE] Apache CarbonData 1.4.0 release

2018-06-05 Thread Liang Chen
Hi

Please find the updated 1.4.0 release notes:
https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+1.4.0+Release

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: carbondata partitioned by date generate many small files

2018-06-05 Thread 陈星宇
hi Li,
Yes,i got the partition folder as you say, but under the partition folder 
,there are many small file just like following picture,
How to merge then automatically after jobs done.



thanks


ChenXingYu
 
 
-- Original --
From:  "Jacky Li";
Date:  Tue, Jun 5, 2018 08:43 PM
To:  "dev"; 

Subject:  Re: carbondata partitioned by date generate many small files

 
Hi,


There is a testcase in StandardPartitionTableQueryTestCase used date column as 
partition column, if you run that testcase, the partition folder generated 
looks like following picture.
 


Are you getting similar folders?


Regards,
Jacky

在 2018年6月5日,下午2:49,陈星宇  写道:

hi carbondata team,


i am using carbondata 1.3.1 to create table and import data, generated many 
small files and spark job is very slow, i suspected the number of file is 
related to the number of spark job . but if i decrease the jobs, job will fail 
because of outofmemory. see my ddl as below:


create table xx.xx(
dept_name string,
xx
.
.
.
) PARTITIONED BY (xxx date)
STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='xxx,xxx,xxx ,xxx,xxx')



please give some advice.


thanks


ChenXingYu

Re: [Discussion] Carbon Local Dictionary Support

2018-06-05 Thread Jacky Li
+1 
Good feature to add in CarbonData

Regards,
Jacky


> 在 2018年6月4日,下午11:10,Kumar Vishal  写道:
> 
> Hi Community,Currently CarbonData supports global dictionary or
> No-Dictionary (Plain-Text stored in LV format) for storing dimension column
> data.
> 
> *Bottleneck with Global Dictionary*
> 
>   1.
> 
>   As dictionary file is mutable file, so it is not possible to support
>   global dictionary in storage environment which does not support append.
>   2.
> 
>   It’s difficult for user to determine whether the column should be
>   dictionary or not if number of columns in table is high.
>   3.
> 
>   Global dictionary generation generally slows down the load process
> 
> *Bottleneck with No-Dictionary*
> 
>   1.
> 
>   Storage size is high
>   2.
> 
>   Query on No-Dictionary column is slower as data read/processed is more
>   3.
> 
>   Filtering is slower on No-Dictionary columns as number of comparison is
>   high
>   4.
> 
>   Memory footprint is high
> 
> The above bottlenecks can be solved by *Generating Local dictionary for low
> cardinality columns at each blocklet level, *which will help to achieve
> below benefits:
> 
>   1.
> 
>   This will help in supporting dictionary generation on different storage
>   environment irrespective of its supported operations(append) on the files.
>   2.
> 
>   Reduces the extra IO operations read/write on the dictionary files
>   generated in case of global dictionary.
>   3.
> 
>   It will eliminate the problem for user to identify the dictionary
>   columns when the number of columns are more in a table.
>   4.
> 
>   It helps in getting more compression on dimension columns with less
>   cardinality.
>   5.
> 
>   Filter query on No-dictionary columns with local dictionary will be
>   faster as filter will be done on encoded data.
>   6.
> 
>   It will help in reducing the store size and memory footprint as only
>   unique values will be stored as part of local dictionary and
>   corresponding data will be stored as encoded data.
> 
> Please provide your comment. Any suggestion from community is most
> welcomed. Please let me know for any clarification.
> 
> -Regards
> Kumar Vishal





[GitHub] carbondata-site pull request #61: fixed the issues of content on website

2018-06-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata-site/pull/61


---


[GitHub] carbondata-site pull request #61: fixed the issues for content on website

2018-06-05 Thread vandana7
GitHub user vandana7 opened a pull request:

https://github.com/apache/carbondata-site/pull/61

fixed  the issues for content on website



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vandana7/carbondata-site content-issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata-site/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #61


commit acd4dd2895995d67876b7748fb03e33600e9202a
Author: vandana7 
Date:   2018-06-05T08:55:02Z

fixed issue with content




---


[GitHub] carbondata-site issue #60: fixed index for the documentation

2018-06-05 Thread vandana7
Github user vandana7 commented on the issue:

https://github.com/apache/carbondata-site/pull/60
  
creating a new PR for the same


---


[GitHub] carbondata-site pull request #60: fixed index for the documentation

2018-06-05 Thread vandana7
Github user vandana7 closed the pull request at:

https://github.com/apache/carbondata-site/pull/60


---


[GitHub] carbondata-site pull request #60: fixed index for the documentation

2018-06-05 Thread vandana7
GitHub user vandana7 opened a pull request:

https://github.com/apache/carbondata-site/pull/60

fixed index for the documentation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vandana7/carbondata-site fix-index

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata-site/pull/60.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #60


commit 7f46b2f8c51d5c2ef10381a3c1a96fc2420f
Author: vandana7 
Date:   2018-06-05T07:48:46Z

fixed index for the documentation




---


carbondata partitioned by date generate many small files

2018-06-05 Thread 陈星宇
hi carbondata team,


i am using carbondata 1.3.1 to create table and import data, generated many 
small files and spark job is very slow, i suspected the number of file is 
related to the number of spark job . but if i decrease the jobs, job will fail 
because of outofmemory. see my ddl as below:


create table xx.xx(
dept_name string,
xx
.
.
.
) PARTITIONED BY (xxx date)
STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='xxx,xxx,xxx ,xxx,xxx')



please give some advice.


thanks


ChenXingYu

carbondata partitioned by data generate many small file

2018-06-05 Thread 陈星宇
hi

Re: [Discussion] Carbon Local Dictionary Support

2018-06-05 Thread xm_zzc
Hi:
  +1.
  This is an exciting feature, hope to have it in version 1.5.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/