Reduce the size of assembly jar

2018-10-07 Thread xm_zzc
Hi dev:
  Currently the size of assembly jar is about 100MB, I find that the size of
'com.amazonaws' package and models folder (including many json files) is
almost half of assembly jar, can we remove 'com.amazonaws' package and
models folder when assemble by default?
  I try to add 'com.amazonaws:*' in pom.xml of assembly
module, and then the size of assembly jar is about 45MB.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-10-07 Thread xm_zzc
Hi:
  I agree with Jacky. 
  Currently if i use the default value of blocklet size (64mb) to create a
table and load some data into table, and then change the default value of
blocklet size to 128mb, it will affect the table created before, is it
right? I think it still need to use 64mb as blocklet size for tables created
before.

These properties either specified by user or from default value need to be
saved when create table:
propertyvaluedefault
value
|Blocklet Size   |64 MB  |64 MB 
|
|Table Block Size  |1024 MB|1024 MB|
|SORT_SCOPE  |LOCAL_SORT |LOCAL_SORT |
|CACHE_LEVEL |BLOCKLET   |BLOCK  |
|AUTO_LOAD_MERGE|true   |false  |
|COMPACTION_LEVEL_THRESHOLD|2,8|4,3|
|COMPACTION_PRESERVE_SEGMENTS|0  |0  |
|ALLOWED_COMPACTION_DAYS |0  |0  |
|MAJOR_COMPACTION_SIZE  |3072 MB|1024 MB|
|Local Dictionary Enabled   |false  |false  |

Hi Jacky:
  I think we need to refactor CarbonCli module and move some common tools to
core module, and then CarbonCli module and Spark2 module can use them,
right?




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-10-07 Thread Jacky Li
Looking at the DESC FORMATTED command again, I still feel it is not very
clear for the table property section. 
For table properties, I think it is not very good for DESC command to print
the default value if the user does not specify when creating the table.
Because the default value in CarbonCommonConstain file may change from
version to version, I think it is better to always write the default value
to table property (in schema file) when loading the table. Then in DESC
table, we can always get the table properties from the schema file. 

So I suggest we do following:
1. categorize the properties into file level, table level, system level
2. write the file level property into data file's footer, including all file
level properties either specified by user or from default value.
3. write the table level property into schema file, including all table
level properties either specified by user or from default value.
4. DESC command should print the properties read from the schema file, which
should contain all table level properties.

Another suggestion is that besides just printing the schema and table
properties like the standard hive DESC command, we can introduce another
command to print the output from calling CarbonCli tool for more profiling
and debugging information, like writing how many files the table contains,
what is the average size of page/blocklet, min/max percentage etc. For
example, the syntax of this command can be "SUMMARY table_name" 

Regards,
Jacky



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/