[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

GitBox Wed, 13 Oct 2021 22:55:10 -0700


akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r728631522




##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|

Review comment:
       `spark.sql.warehouse.dir` is a spark property and no need to add the 
documentation here. Also, there is already a note present in document saying 
`carbon.storelocation` is depracated.

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by 
default present in conf directory on your base project path. Users can 
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in 
which data will be stored in the column with DATE data type. |

Review comment:
       if you say the data will be stored in column in this format, it conveys 
a wrong info. Because, we store date as integer as its direct dictionary and 
time as long. Instead, you can say this specifies the way carbondata parses the 
date data for all the incoming date data to finally store in carbondata file. 
may be something like this.

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by 
default present in conf directory on your base project path. Users can 
configure all the carbondata related properties in this file. |

Review comment:
       this is system property, where this file contains the carbon property, i 
think this place is not the better place to mention this. May be you can try to 
add these info in may be deployment guide or quickstart

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the 
configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** 
upto which driver can cache partition metadata. Beyond this, least recently 
used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by 
column is in sort column, specify that sort column here to avoid ordering at 
map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in 
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in 
CarbonFileMetastore, after the time configured since last access to the cache 
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent 
access will refresh the timer. Default value of Long.MAX_VALUE means the cache 
will not be expired by time. **NOTE:** At the time when cache is being expired, 
queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing 
indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the 
number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations 
for faster query execution. Setting this property acts like a catalyst for 
filter queries. If set to true, the bitset is passed from one filter to 
another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert 
call is triggered internally or by the user. If triggered by user, this ensures 
data does not get loaded into MV directly |

Review comment:
       this is actually an internal property no need to add in doc

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the 
configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** 
upto which driver can cache partition metadata. Beyond this, least recently 
used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by 
column is in sort column, specify that sort column here to avoid ordering at 
map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in 
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in 
CarbonFileMetastore, after the time configured since last access to the cache 
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent 
access will refresh the timer. Default value of Long.MAX_VALUE means the cache 
will not be expired by time. **NOTE:** At the time when cache is being expired, 
queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing 
indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the 
number of records queried after which input metrics are updated to spark. |

Review comment:
       for this add, it can be set dynamically also within session

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 
rows. |

Review comment:
       no need to mention about V2 here, as no one is using it. For blocklet 
size you can just say size of each blocklet will be 64mb by default inside a 
block. Recommended not to change it unless for any specific use or any specific 
issue. 

##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.enable.bad.record.handling.for.insert | false | by default, disable 
the bad record and converter step during "insert into" |
 | carbon.load.si.repair | true | by default, enable loading for failed 
segments in SI during load/insert command |
 | carbon.si.repair.limit | (none) | Number of failed segments to be loaded in 
SI when repairing missing segments in SI, by default load all the missing 
segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing 
complex data type columns. Level 1 delimiter splits the complex type data 
column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex 
type nested data column in a row. Applies level_1 delimiter & applies level_2 
based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex 
type nested data column in a row. Applies level_1 delimiter, applies level_2 
and then level_3 delimiter based on complex data type. Used in case of nested 
Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of 
Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are 
used for parsing complex data type columns. All the delimiters are applied 
depending on the complexity of the given data type. Level 4 delimiter will be 
used for parsing the complex values after level 3 delimiter has been applied 
already. |
+| enable.unsafe.columnpage | true | This property enables creation of column 
pages while writing on off heap (unsafe) memory. It is set by default |
+| carbon.lucene.compression.mode | speed | Carbondata supports different types 
of indices for efficient queries. This parameter decides the compression mode 
used by lucene index for index writing. In the default mode, writing speed is 
given more priority rather than the index size. |

Review comment:
       please remove this also, as its present in lucene-index-guide.md

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by 
default present in conf directory on your base project path. Users can 
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in 
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of 
ICarbonLock interface to be used for acquiring the locks in case of concurrent 
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property 
enables the generation of local dictionary. Local dictionary enables to map 
string and varchar values to numbers which helps in storing the data 
efficiently. |

Review comment:
       you can remove this property, its already mentioned in document 
ddl-of-carbondata.md

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by 
default present in conf directory on your base project path. Users can 
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in 
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of 
ICarbonLock interface to be used for acquiring the locks in case of concurrent 
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property 
enables the generation of local dictionary. Local dictionary enables to map 
string and varchar values to numbers which helps in storing the data 
efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not 
be maintained for the blocklet. During fallback, actual data will be retrieved 
from the encoded page data using local dictionary. NOTE: Memory footprint 
decreases significantly as compared to when this property is set to false |

Review comment:
       same comment as above. You can first check whole project if its already 
present, if not you can add, else you can avoid duplication

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by 
default present in conf directory on your base project path. Users can 
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in 
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of 
ICarbonLock interface to be used for acquiring the locks in case of concurrent 
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property 
enables the generation of local dictionary. Local dictionary enables to map 
string and varchar values to numbers which helps in storing the data 
efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not 
be maintained for the blocklet. During fallback, actual data will be retrieved 
from the encoded page data using local dictionary. NOTE: Memory footprint 
decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for 
using zookeeper based locking |

Review comment:
       this is also spark property, no need to add here

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path 
on DFS where carbondata files and metadata will be stored. The configuration 
`carbon.storelocation` has been deprecated. For simplicity, we recommended you 
remove the configuration of `carbon.storelocation`. If `carbon.storelocation` 
and `spark.sql.warehouse.dir` are configured to different paths, exception will 
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database 
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by 
default present in conf directory on your base project path. Users can 
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in 
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of 
ICarbonLock interface to be used for acquiring the locks in case of concurrent 
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property 
enables the generation of local dictionary. Local dictionary enables to map 
string and varchar values to numbers which helps in storing the data 
efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not 
be maintained for the blocklet. During fallback, actual data will be retrieved 
from the encoded page data using local dictionary. NOTE: Memory footprint 
decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for 
using zookeeper based locking |
+| carbon.data.file.version | V3 | This specifies carbondata file format 
version. Carbondata file format has evolved with time from V1 to V3 in terms of 
metadata storage and IO level pruning capabilities. You can find more details 
[here](https://carbondata.apache.org/file-structure-of-carbondata.html#carbondata-file-format).
 |
+| spark.carbon.hive.schema.store | false | Carbondata currently supports 2 
different types of metastores for storing schemas. This property specifies if 
Hive metastore is to be used for storing and retrieving table schemas |
+| spark.carbon.sqlastbuilder.classname | 
`org.apache.spark.sql.hive.CarbonSqlAstBuilder` | Carbondata extension of 
spark's `SparkSqlAstBuilder` that converts an ANTLR ParseTree into a logical 
plan. |

Review comment:
       i think no need to mention this because just configuring carbon 
extensions class would be enough for carbon to work, so this will simply 
confuse the user.

##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.enable.bad.record.handling.for.insert | false | by default, disable 
the bad record and converter step during "insert into" |
 | carbon.load.si.repair | true | by default, enable loading for failed 
segments in SI during load/insert command |
 | carbon.si.repair.limit | (none) | Number of failed segments to be loaded in 
SI when repairing missing segments in SI, by default load all the missing 
segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing 
complex data type columns. Level 1 delimiter splits the complex type data 
column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex 
type nested data column in a row. Applies level_1 delimiter & applies level_2 
based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex 
type nested data column in a row. Applies level_1 delimiter, applies level_2 
and then level_3 delimiter based on complex data type. Used in case of nested 
Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of 
Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are 
used for parsing complex data type columns. All the delimiters are applied 
depending on the complexity of the given data type. Level 4 delimiter will be 
used for parsing the complex values after level 3 delimiter has been applied 
already. |
+| enable.unsafe.columnpage | true | This property enables creation of column 
pages while writing on off heap (unsafe) memory. It is set by default |

Review comment:
       you can remove this, as its already present in usecases.md file. Else 
you can just copy the same thing here also




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Reply via email to