[GitHub] [druid] jihoonson edited a comment on issue #10821: question about druid architecture

GitBox Sun, 31 Jan 2021 10:30:46 -0800


jihoonson edited a comment on issue #10821:
URL: https://github.com/apache/druid/issues/10821#issuecomment-770426661



   Hi, thanks for your interest in Druid. I see many questions are about the 
realtime node, which was [completely removed in 
0.16.0](https://druid.apache.org/docs/latest/ingestion/standalone-realtime.html).
 I would suggest looking at the Kafka/Kinesis indexing service which are the 
modern and standard way for streaming ingestion in Druid.
   
   > 1. When stored in deep storage periodically in real-time node, what is the 
standard for storing in deep storage? (For example, once a minute in deep 
storage, etc.)
   
   I don't remember much about the realtime node. For the Kafka and Kinesis 
indexing services, both use the dynamic partitioning which partitions data 
based on the size. The size limit is controlled by [`maxRowsPerSegment` and 
`maxTotalRows`](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#kafkasupervisortuningconfig).
 That means, whenever the Kafka/Kinesis indexing task hits one of those limits, 
they will push the segments to deep storage. Finally, those tasks publish all 
remaining segments at the end of their lifecycle after 
[`taskDuration`](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#kafkasupervisorioconfig).
   
   > 2. When saving the result obtained from the history node in the broker 
node, is it saved in segment units? When saving, I am wondering whether the 
entire segment is saved or only the result value of the query is saved 
correctly.
   
   The broker is designed to merge results in a streaming fashion from 
historicals and streaming ingestion tasks, and so does not usually store query 
results. However, it can _buffer_ some of the results in its memory if the 
client doesn't read query results fast enough. There are some exceptions where 
your query has subqueries or sub-totals that needs to store intermediate 
results in Broker memory. Only intermediate results are stored in these cases.
    
   > 2-1) If what I understand correctly, there is a cache in the historical 
node.
   > When a query requests cache data,
   > If the query is different but the result is the same, how do you handle it?
   
   Historicals use two types of caches, i.e., the linux disk cache and the 
segment-level result cache. Every segment in historicals is memory-mapped and 
is cached in the disk cache when queries read them. The segment-level result 
cache can be thought as a map _from queries to by-segment results_. Different 
queries will hit different results.
   
   > 3. Can the result value of a real-time node be stored directly in a 
historical node without going through deep storage?
   
   No. Every indexing tasks should go through the steps explained in 
[here](https://druid.apache.org/docs/latest/design/architecture.html#indexing-and-handoff).
   
   > 4. When indexing in a real-time node, is the bitmap encoding random or is 
there a criterion (if so, what is the criterion)
   
   Sorry, I don't understand your question. What do you mean by random bitmap 
encoding?
   
   > 5. What is the reference point at which the real-time node index is 
flushed in in-memory or off-heap memory?
   
   Similar to the question 1, I don't remember much about the realtime node. 
For Kafka/Kinesis, they use [`maxRowsInMemory` and 
`maxBytesInMemory`](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#kafkasupervisortuningconfig).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson edited a comment on issue #10821: question about druid architecture

Reply via email to