jihoonson edited a comment on issue #10821: URL: https://github.com/apache/druid/issues/10821#issuecomment-770426661
Hi, thanks for your interest in Druid. I see many questions are about the realtime node, which was [completely removed in 0.16.0](https://druid.apache.org/docs/latest/ingestion/standalone-realtime.html). I would suggest looking at the Kafka/Kinesis indexing service which are the modern and standard way for streaming ingestion in Druid. > 1. When stored in deep storage periodically in real-time node, what is the standard for storing in deep storage? (For example, once a minute in deep storage, etc.) I don't remember much about the realtime node. For the Kafka and Kinesis indexing services, both use the dynamic partitioning which partitions data based on the size. The size limit is controlled by [`maxRowsPerSegment` and `maxTotalRows`](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#kafkasupervisortuningconfig). That means, whenever the Kafka/Kinesis indexing task hits one of those limits, they will push the segments to deep storage. Finally, those tasks publish all remaining segments at the end of their lifecycle after [`taskDuration`](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#kafkasupervisorioconfig). > 2. When saving the result obtained from the history node in the broker node, is it saved in segment units? When saving, I am wondering whether the entire segment is saved or only the result value of the query is saved correctly. The broker is designed to merge results in a streaming fashion from historicals and streaming ingestion tasks, and so does not usually store query results. However, it can _buffer_ some of the results in its memory if the client doesn't read query results fast enough. There are some exceptions where your query has subqueries or sub-totals that needs to store intermediate results in Broker memory. Only intermediate results are stored in these cases. > 2-1) If what I understand correctly, there is a cache in the historical node. > When a query requests cache data, > If the query is different but the result is the same, how do you handle it? Historicals use two types of caches, i.e., the linux disk cache and the segment-level result cache. Every segment in historicals is memory-mapped and is cached in the disk cache when queries read them. The segment-level result cache can be thought as a map _from queries to by-segment results_. Different queries will hit different results. > 3. Can the result value of a real-time node be stored directly in a historical node without going through deep storage? No. Every indexing tasks should go through the steps explained in [here](https://druid.apache.org/docs/latest/design/architecture.html#indexing-and-handoff). > 4. When indexing in a real-time node, is the bitmap encoding random or is there a criterion (if so, what is the criterion) Sorry, I don't understand your question. What do you mean by random bitmap encoding? > 5. What is the reference point at which the real-time node index is flushed in in-memory or off-heap memory? Similar to the question 1, I don't remember much about the realtime node. For Kafka/Kinesis, they use [`maxRowsInMemory` and `maxBytesInMemory`](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#kafkasupervisortuningconfig). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
