akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r728631522
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
Review comment:
`spark.sql.warehouse.dir` is a spark property and no need to add the
documentation here. Also, there is already a note present in document saying
`carbon.storelocation` is depracated.
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by
default present in conf directory on your base project path. Users can
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in
which data will be stored in the column with DATE data type. |
Review comment:
if you say the data will be stored in column in this format, it conveys
a wrong info. Because, we store date as integer as its direct dictionary and
time as long. Instead, you can say this specifies the way carbondata parses the
date data for all the incoming date data to finally store in carbondata file.
may be something like this.
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by
default present in conf directory on your base project path. Users can
configure all the carbondata related properties in this file. |
Review comment:
this is system property, where this file contains the carbon property, i
think this place is not the better place to mention this. May be you can try to
add these info in may be deployment guide or quickstart
##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the
configurations required for the Car
| carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)**
upto which driver can cache partition metadata. Beyond this, least recently
used data will be removed from cache before loading new set of values.
| carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by
column is in sort column, specify that sort column here to avoid ordering at
map task . |
| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in
CarbonFileMetastore, after the time configured since last access to the cache
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent
access will refresh the timer. Default value of Long.MAX_VALUE means the cache
will not be expired by time. **NOTE:** At the time when cache is being expired,
queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing
indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the
number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations
for faster query execution. Setting this property acts like a catalyst for
filter queries. If set to true, the bitset is passed from one filter to
another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert
call is triggered internally or by the user. If triggered by user, this ensures
data does not get loaded into MV directly |
Review comment:
this is actually an internal property no need to add in doc
##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the
configurations required for the Car
| carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)**
upto which driver can cache partition metadata. Beyond this, least recently
used data will be removed from cache before loading new set of values.
| carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by
column is in sort column, specify that sort column here to avoid ordering at
map task . |
| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in
CarbonFileMetastore, after the time configured since last access to the cache
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent
access will refresh the timer. Default value of Long.MAX_VALUE means the cache
will not be expired by time. **NOTE:** At the time when cache is being expired,
queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing
indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the
number of records queried after which input metrics are updated to spark. |
Review comment:
for this add, it can be set dynamically also within session
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000
rows. |
Review comment:
no need to mention about V2 here, as no one is using it. For blocklet
size you can just say size of each blocklet will be 64mb by default inside a
block. Recommended not to change it unless for any specific use or any specific
issue.
##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations
required for the Car
| carbon.enable.bad.record.handling.for.insert | false | by default, disable
the bad record and converter step during "insert into" |
| carbon.load.si.repair | true | by default, enable loading for failed
segments in SI during load/insert command |
| carbon.si.repair.limit | (none) | Number of failed segments to be loaded in
SI when repairing missing segments in SI, by default load all the missing
segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing
complex data type columns. Level 1 delimiter splits the complex type data
column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex
type nested data column in a row. Applies level_1 delimiter & applies level_2
based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex
type nested data column in a row. Applies level_1 delimiter, applies level_2
and then level_3 delimiter based on complex data type. Used in case of nested
Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of
Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are
used for parsing complex data type columns. All the delimiters are applied
depending on the complexity of the given data type. Level 4 delimiter will be
used for parsing the complex values after level 3 delimiter has been applied
already. |
+| enable.unsafe.columnpage | true | This property enables creation of column
pages while writing on off heap (unsafe) memory. It is set by default |
+| carbon.lucene.compression.mode | speed | Carbondata supports different types
of indices for efficient queries. This parameter decides the compression mode
used by lucene index for index writing. In the default mode, writing speed is
given more priority rather than the index size. |
Review comment:
please remove this also, as its present in lucene-index-guide.md
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by
default present in conf directory on your base project path. Users can
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of
ICarbonLock interface to be used for acquiring the locks in case of concurrent
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property
enables the generation of local dictionary. Local dictionary enables to map
string and varchar values to numbers which helps in storing the data
efficiently. |
Review comment:
you can remove this property, its already mentioned in document
ddl-of-carbondata.md
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by
default present in conf directory on your base project path. Users can
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of
ICarbonLock interface to be used for acquiring the locks in case of concurrent
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property
enables the generation of local dictionary. Local dictionary enables to map
string and varchar values to numbers which helps in storing the data
efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not
be maintained for the blocklet. During fallback, actual data will be retrieved
from the encoded page data using local dictionary. NOTE: Memory footprint
decreases significantly as compared to when this property is set to false |
Review comment:
same comment as above. You can first check whole project if its already
present, if not you can add, else you can avoid duplication
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by
default present in conf directory on your base project path. Users can
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of
ICarbonLock interface to be used for acquiring the locks in case of concurrent
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property
enables the generation of local dictionary. Local dictionary enables to map
string and varchar values to numbers which helps in storing the data
efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not
be maintained for the blocklet. During fallback, actual data will be retrieved
from the encoded page data using local dictionary. NOTE: Memory footprint
decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for
using zookeeper based locking |
Review comment:
this is also spark property, no need to add here
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path
on DFS where carbondata files and metadata will be stored. The configuration
`carbon.storelocation` has been deprecated. For simplicity, we recommended you
remove the configuration of `carbon.storelocation`. If `carbon.storelocation`
and `spark.sql.warehouse.dir` are configured to different paths, exception will
be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database
location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000
rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by
default present in conf directory on your base project path. Users can
configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in
which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of
ICarbonLock interface to be used for acquiring the locks in case of concurrent
operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property
enables the generation of local dictionary. Local dictionary enables to map
string and varchar values to numbers which helps in storing the data
efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not
be maintained for the blocklet. During fallback, actual data will be retrieved
from the encoded page data using local dictionary. NOTE: Memory footprint
decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for
using zookeeper based locking |
+| carbon.data.file.version | V3 | This specifies carbondata file format
version. Carbondata file format has evolved with time from V1 to V3 in terms of
metadata storage and IO level pruning capabilities. You can find more details
[here](https://carbondata.apache.org/file-structure-of-carbondata.html#carbondata-file-format).
|
+| spark.carbon.hive.schema.store | false | Carbondata currently supports 2
different types of metastores for storing schemas. This property specifies if
Hive metastore is to be used for storing and retrieving table schemas |
+| spark.carbon.sqlastbuilder.classname |
`org.apache.spark.sql.hive.CarbonSqlAstBuilder` | Carbondata extension of
spark's `SparkSqlAstBuilder` that converts an ANTLR ParseTree into a logical
plan. |
Review comment:
i think no need to mention this because just configuring carbon
extensions class would be enough for carbon to work, so this will simply
confuse the user.
##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations
required for the Car
| carbon.enable.bad.record.handling.for.insert | false | by default, disable
the bad record and converter step during "insert into" |
| carbon.load.si.repair | true | by default, enable loading for failed
segments in SI during load/insert command |
| carbon.si.repair.limit | (none) | Number of failed segments to be loaded in
SI when repairing missing segments in SI, by default load all the missing
segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing
complex data type columns. Level 1 delimiter splits the complex type data
column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex
type nested data column in a row. Applies level_1 delimiter & applies level_2
based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex
type nested data column in a row. Applies level_1 delimiter, applies level_2
and then level_3 delimiter based on complex data type. Used in case of nested
Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of
Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are
used for parsing complex data type columns. All the delimiters are applied
depending on the complexity of the given data type. Level 4 delimiter will be
used for parsing the complex values after level 3 delimiter has been applied
already. |
+| enable.unsafe.columnpage | true | This property enables creation of column
pages while writing on off heap (unsafe) memory. It is set by default |
Review comment:
you can remove this, as its already present in usecases.md file. Else
you can just copy the same thing here also
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]