[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

GitBox Mon, 25 Oct 2021 23:56:38 -0700


akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r736205380




##########
File path: docs/configuration-parameters.md
##########
@@ -70,6 +75,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.load.global.sort.partitions | 0 | The number of partitions to use 
when shuffling data for global sort. Default value 0 means to use same number 
of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have 
2-3 tasks per CPU core in your cluster. |
 | carbon.sort.size | 100000 | Number of records to hold in memory to sort and 
write intermediate sort temp files. **NOTE:** Memory required for data loading 
will increase if you turn this value bigger. Besides each thread will cache 
this amout of records. The number of threads is configured by 
*carbon.number.of.cores.while.loading*. |
 | carbon.options.bad.records.logger.enable | false | CarbonData can identify 
the records that are not conformant to schema and isolate them as bad records. 
Enabling this configuration will make CarbonData to log such bad records. 
**NOTE:** If the input data contains many bad records, logging them will slow 
down the over all data loading throughput. The data load operation status would 
depend on the configuration in ***carbon.bad.records.action***. |
+| carbon.options.bad.records.action | FAIL | This property has four types of  
bad record actions: FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it 
auto-corrects the data by storing the bad records as NULL. If set to REDIRECT 
then bad records are written to the raw CSV instead of being loaded. If set to 
IGNORE then bad records are neither loaded nor written to the raw CSV. If set 
to FAIL then data loading fails if any bad records are found. |

Review comment:
       this property is actually the load level property. So its better to 
mention that first it takes load options and check for `bad_records_action`, if 
not present, then check for `carbon.options.bad.records.action` load property 
and if not configured takes value from `carbon.bad.records.action` system level 
property, if not configured considers default value with is FAIL.
   
   So this will clarify the priority of how we consider in carbon.

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +161,16 @@ This section provides the details of all the 
configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** 
upto which driver can cache partition metadata. Beyond this, least recently 
used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by 
column is in sort column, specify that sort column here to avoid ordering at 
map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in 
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in 
CarbonFileMetastore, after the time configured since last access to the cache 
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent 
access will refresh the timer. Default value of Long.MAX_VALUE means the cache 
will not be expired by time. **NOTE:** At the time when cache is being expired, 
queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing 
indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the 
number of records queried after which input metrics are updated to spark. It 
can be set dynamically within spark session itself as well. |

Review comment:
       same as above comment

##########
File path: docs/configuration-parameters.md
##########
@@ -119,6 +136,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.enable.range.compaction | true | To configure Range-based Compaction 
to be used or not for RANGE_COLUMN. If true after compaction also the data 
would be present in ranges. |
 | carbon.si.segment.merge | false | Making this true degrades the LOAD 
performance. When the number of small files increase for SI segments(it can 
happen as number of columns will be less and we store position id and reference 
columns), user can either set to true which will merge the data files for 
upcoming loads or run SI refresh command which does this job for all segments. 
(REFRESH INDEX <index_table>) |
 | carbon.partition.data.on.tasklevel | false | When enabled, tasks launched 
for Local sort partition load will be based on one node one task. Compaction 
will be performed based on task level for a partition. Load performance might 
be degraded, because, the number of tasks launched is equal to number of nodes 
in case of local sort. For compaction, memory consumption will be less, as more 
number of tasks will be launched for a partition |
+| carbon.minor.compaction.size | (none) | Minor compaction originally worked 
based on the number of segments (by default 4). However in that scenario, there 
was no control over the size of segments to be compacted. This parameter was 
introduced to exclude segments whose size is greater than the configured 
threshold so that the overall IO and time taken decreases | 

Review comment:
       if dynamically configurable, please add this property in dynamic 
property section

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,11 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of 
days after which the timestamp based subdirectories are expired in the trash 
folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the 
clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether 
the min max pruning to be performed on the target table based on the source 
data. It will be useful when data is not sparse across target table which 
results in better pruning.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. It is recommended not to change this value except for 
some specific use case. |

Review comment:
       ```suggestion
   | carbon.blocklet.size | 64 MB | Carbondata file consist of blocklets which 
further consists of column pages. As per the latest V3 format, the default size 
of a blocklet is 64 MB. It is recommended not to change this value except for 
some specific use case. |
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Reply via email to