[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3627: [CARBONDATA-3710] Make stage files queryable

GitBox Mon, 24 Feb 2020 22:06:18 -0800

ajantha-bhat commented on a change in pull request #3627: [CARBONDATA-3710] 
Make stage files queryable
URL: https://github.com/apache/carbondata/pull/3627#discussion_r383674740


 ##########
 File path: docs/configuration-parameters.md
 ##########
 @@ -144,6 +144,7 @@ This section provides the details of all the 
configurations required for the Car
 | carbon.heap.memory.pooling.threshold.bytes | 1048576 | CarbonData supports 
unsafe operations of Java to avoid GC overhead for certain operations. Using 
unsafe, memory can be allocated on Java Heap or off heap. This configuration 
controls the allocation mechanism on Java HEAP. If the heap memory allocations 
of the given size is greater or equal than this value,it should go through the 
pooling mechanism. But if set this size to -1, it should not go through the 
pooling mechanism. Default value is 1048576(1MB, the same as Spark). Value to 
be specified in bytes. |
 | carbon.push.rowfilters.for.vector | false | When enabled complete row 
filters will be handled by carbon in case of vector. If it is disabled then 
only page level pruning will be done by carbon and row level filtering will be 
done by spark for vector. And also there are scan optimizations in carbon to 
avoid multiple data copies when this parameter is set to false. There is no 
change in flow for non-vector based queries. |
 | carbon.query.prefetch.enable | true | By default this property is true, so 
prefetch is used in query to read next blocklet asynchronously in other thread 
while processing current blocklet in main thread. This can help to reduce CPU 
idle time. Setting this property false will disable this prefetch feature in 
query. |
+| carbon.query.stage.input.enable | false | Stage input files are data files 
written by external applications (such as Flink) but has not been loaded into 
carbon table. Enabling this configuration makes query includes these files, 
thus makes query on latest data. However, since these files are not indexed, 
query maybe slower due to full scan is required for these files. |
 
 Review comment:
   Pushed down filter cases , how it is handled ? for non-stage files carbon 
applies the filtering, but stage files cannot be filtered. spark may not apply 
filter for carbon results because filter is already pushed down to carbon ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3627: [CARBONDATA-3710] Make stage files queryable

Reply via email to