akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r736205380
##########
File path: docs/configuration-parameters.md
##########
@@ -70,6 +75,7 @@ This section provides the details of all the configurations
required for the Car
| carbon.load.global.sort.partitions | 0 | The number of partitions to use
when shuffling data for global sort. Default value 0 means to use same number
of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have
2-3 tasks per CPU core in your cluster. |
| carbon.sort.size | 100000 | Number of records to hold in memory to sort and
write intermediate sort temp files. **NOTE:** Memory required for data loading
will increase if you turn this value bigger. Besides each thread will cache
this amout of records. The number of threads is configured by
*carbon.number.of.cores.while.loading*. |
| carbon.options.bad.records.logger.enable | false | CarbonData can identify
the records that are not conformant to schema and isolate them as bad records.
Enabling this configuration will make CarbonData to log such bad records.
**NOTE:** If the input data contains many bad records, logging them will slow
down the over all data loading throughput. The data load operation status would
depend on the configuration in ***carbon.bad.records.action***. |
+| carbon.options.bad.records.action | FAIL | This property has four types of
bad record actions: FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it
auto-corrects the data by storing the bad records as NULL. If set to REDIRECT
then bad records are written to the raw CSV instead of being loaded. If set to
IGNORE then bad records are neither loaded nor written to the raw CSV. If set
to FAIL then data loading fails if any bad records are found. |
Review comment:
this property is actually the load level property. So its better to
mention that first it takes load options and check for `bad_records_action`, if
not present, then check for `carbon.options.bad.records.action` load property
and if not configured takes value from `carbon.bad.records.action` system level
property, if not configured considers default value with is FAIL.
So this will clarify the priority of how we consider in carbon.
##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +161,16 @@ This section provides the details of all the
configurations required for the Car
| carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)**
upto which driver can cache partition metadata. Beyond this, least recently
used data will be removed from cache before loading new set of values.
| carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by
column is in sort column, specify that sort column here to avoid ordering at
map task . |
| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in
CarbonFileMetastore, after the time configured since last access to the cache
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent
access will refresh the timer. Default value of Long.MAX_VALUE means the cache
will not be expired by time. **NOTE:** At the time when cache is being expired,
queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing
indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the
number of records queried after which input metrics are updated to spark. It
can be set dynamically within spark session itself as well. |
Review comment:
same as above comment
##########
File path: docs/configuration-parameters.md
##########
@@ -119,6 +136,7 @@ This section provides the details of all the configurations
required for the Car
| carbon.enable.range.compaction | true | To configure Range-based Compaction
to be used or not for RANGE_COLUMN. If true after compaction also the data
would be present in ranges. |
| carbon.si.segment.merge | false | Making this true degrades the LOAD
performance. When the number of small files increase for SI segments(it can
happen as number of columns will be less and we store position id and reference
columns), user can either set to true which will merge the data files for
upcoming loads or run SI refresh command which does this job for all segments.
(REFRESH INDEX <index_table>) |
| carbon.partition.data.on.tasklevel | false | When enabled, tasks launched
for Local sort partition load will be based on one node one task. Compaction
will be performed based on task level for a partition. Load performance might
be degraded, because, the number of tasks launched is equal to number of nodes
in case of local sort. For compaction, memory consumption will be less, as more
number of tasks will be launched for a partition |
+| carbon.minor.compaction.size | (none) | Minor compaction originally worked
based on the number of segments (by default 4). However in that scenario, there
was no control over the size of segments to be compacted. This parameter was
introduced to exclude segments whose size is greater than the configured
threshold so that the overall IO and time taken decreases |
Review comment:
if dynamically configurable, please add this property in dynamic
property section
##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,11 @@ This section provides the details of all the configurations
required for the Car
| carbon.trash.retention.days | 7 | This parameter specifies the number of
days after which the timestamp based subdirectories are expired in the trash
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
| carbon.clean.file.force.allowed | false | This parameter specifies if the
clean files operation with force option is allowed or not.|
| carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether
the min max pruning to be performed on the target table based on the source
data. It will be useful when data is not sparse across target table which
results in better pruning.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. It is recommended not to change this value except for
some specific use case. |
Review comment:
```suggestion
| carbon.blocklet.size | 64 MB | Carbondata file consist of blocklets which
further consists of column pages. As per the latest V3 format, the default size
of a blocklet is 64 MB. It is recommended not to change this value except for
some specific use case. |
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]