akashrn5 commented on a change in pull request #4210: URL: https://github.com/apache/carbondata/pull/4210#discussion_r736205380
########## File path: docs/configuration-parameters.md ########## @@ -70,6 +75,7 @@ This section provides the details of all the configurations required for the Car | carbon.load.global.sort.partitions | 0 | The number of partitions to use when shuffling data for global sort. Default value 0 means to use same number of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have 2-3 tasks per CPU core in your cluster. | | carbon.sort.size | 100000 | Number of records to hold in memory to sort and write intermediate sort temp files. **NOTE:** Memory required for data loading will increase if you turn this value bigger. Besides each thread will cache this amout of records. The number of threads is configured by *carbon.number.of.cores.while.loading*. | | carbon.options.bad.records.logger.enable | false | CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records. **NOTE:** If the input data contains many bad records, logging them will slow down the over all data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. | +| carbon.options.bad.records.action | FAIL | This property has four types of bad record actions: FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. | Review comment: this property is actually the load level property. So its better to mention that first it takes load options and check for `bad_records_action`, if not present, then check for `carbon.options.bad.records.action` load property and if not configured takes value from `carbon.bad.records.action` system level property, if not configured considers default value with is FAIL. So this will clarify the priority of how we consider in carbon. ########## File path: docs/configuration-parameters.md ########## @@ -151,6 +161,16 @@ This section provides the details of all the configurations required for the Car | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values. | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . | | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. | +| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. | +| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. It can be set dynamically within spark session itself as well. | Review comment: same as above comment ########## File path: docs/configuration-parameters.md ########## @@ -119,6 +136,7 @@ This section provides the details of all the configurations required for the Car | carbon.enable.range.compaction | true | To configure Range-based Compaction to be used or not for RANGE_COLUMN. If true after compaction also the data would be present in ranges. | | carbon.si.segment.merge | false | Making this true degrades the LOAD performance. When the number of small files increase for SI segments(it can happen as number of columns will be less and we store position id and reference columns), user can either set to true which will merge the data files for upcoming loads or run SI refresh command which does this job for all segments. (REFRESH INDEX <index_table>) | | carbon.partition.data.on.tasklevel | false | When enabled, tasks launched for Local sort partition load will be based on one node one task. Compaction will be performed based on task level for a partition. Load performance might be degraded, because, the number of tasks launched is equal to number of nodes in case of local sort. For compaction, memory consumption will be less, as more number of tasks will be launched for a partition | +| carbon.minor.compaction.size | (none) | Minor compaction originally worked based on the number of segments (by default 4). However in that scenario, there was no control over the size of segments to be compacted. This parameter was introduced to exclude segments whose size is greater than the configured threshold so that the overall IO and time taken decreases | Review comment: if dynamically configurable, please add this property in dynamic property section ########## File path: docs/configuration-parameters.md ########## @@ -52,6 +52,11 @@ This section provides the details of all the configurations required for the Car | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days| | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.| | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.| +| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. It is recommended not to change this value except for some specific use case. | Review comment: ```suggestion | carbon.blocklet.size | 64 MB | Carbondata file consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. It is recommended not to change this value except for some specific use case. | ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@carbondata.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org