LakshSingla commented on code in PR #15987:
URL: https://github.com/apache/druid/pull/15987#discussion_r1512901464
##########
processing/src/main/java/org/apache/druid/frame/segment/FrameCursorUtils.java:
##########
@@ -79,60 +79,71 @@ public static Filter buildFilter(@Nullable Filter filter,
Interval interval)
/**
* Writes a {@link Cursor} to a sequence of {@link Frame}. This method
iterates over the rows of the cursor,
- * and writes the columns to the frames
- *
- * @param cursor Cursor to write to the frame
- * @param frameWriterFactory Frame writer factory to write to the frame.
- * Determines the signature of the rows that
are written to the frames
+ * and writes the columns to the frames. The iterable is lazy, and it
traverses the required portion of the cursor
+ * as required
*/
- public static Sequence<Frame> cursorToFrames(
- Cursor cursor,
- FrameWriterFactory frameWriterFactory
+ public static Iterable<Frame> cursorToFramesIterable(
+ final Cursor cursor,
+ final FrameWriterFactory frameWriterFactory
)
{
+ return () -> new Iterator<Frame>()
+ {
+ @Override
+ public boolean hasNext()
+ {
+ return !cursor.isDone();
+ }
- return Sequences.simple(
- () -> new Iterator<Frame>()
- {
- @Override
- public boolean hasNext()
- {
- return !cursor.isDone();
- }
-
- @Override
- public Frame next()
- {
- // Makes sure that cursor contains some elements prior. This
ensures if no row is written, then the row size
- // is larger than the MemoryAllocators returned by the provided
factory
- if (!hasNext()) {
- throw new NoSuchElementException();
+ @Override
+ public Frame next()
+ {
+ // Makes sure that cursor contains some elements prior. This ensures
if no row is written, then the row size
+ // is larger than the MemoryAllocators returned by the provided factory
+ if (!hasNext()) {
+ throw new NoSuchElementException();
+ }
+ boolean firstRowWritten = false;
+ Frame frame;
+ try (final FrameWriter frameWriter =
frameWriterFactory.newFrameWriter(cursor.getColumnSelectorFactory())) {
+ while (!cursor.isDone()) {
+ if (!frameWriter.addSelection()) {
+ break;
}
- boolean firstRowWritten = false;
- Frame frame;
- try (final FrameWriter frameWriter =
frameWriterFactory.newFrameWriter(cursor.getColumnSelectorFactory())) {
- while (!cursor.isDone()) {
- if (!frameWriter.addSelection()) {
- break;
- }
- firstRowWritten = true;
- cursor.advance();
- }
-
- if (!firstRowWritten) {
- throw DruidException
- .forPersona(DruidException.Persona.DEVELOPER)
- .ofCategory(DruidException.Category.CAPACITY_EXCEEDED)
- .build("Subquery's row size exceeds the frame size and
therefore cannot write the subquery's "
- + "row to the frame. This is a non-configurable
static limit that can only be modified by the "
- + "developer.");
- }
+ firstRowWritten = true;
+ cursor.advance();
+ }
- frame = Frame.wrap(frameWriter.toByteArray());
- }
- return frame;
+ if (!firstRowWritten) {
+ throw DruidException
+ .forPersona(DruidException.Persona.DEVELOPER)
+ .ofCategory(DruidException.Category.CAPACITY_EXCEEDED)
+ .build("Subquery's row size exceeds the frame size and
therefore cannot write the subquery's "
Review Comment:
I will update the message with the frame size.
However, I don't think it makes sense to put the corrective action here,
given that this is an error message aimed at the `DEVELOPER` persona. Messages
aimed at developers mean that something went wrong here and we don't expect to
hit these criteria. Given that this is a special case wherein the user's data
shape can trigger this, I still think that we shouldn't add a corrective action
because:
a) If the row > frameSize, it usually is due to a large array/string column,
and not lots of individual columns.
b) There may not be a way to correct the subquery while preserving the
correctness.
c) Such subqueries won't get limited properly, because the individual row
size is too large, that we expect to overshoot the memory limits a lot.
In all, we usually won't expect to hit this error message. If we still want
to present a corrective action, I think we should do both of the following
instead:
a) Change the persona of the error message, which I am not a big fan of,
given that hitting this usually means suboptimal use of memory limiting, since
each row > 8MB (wherein each row will become hard to limit)
b) And change the corrective action to disable the memory based limiting,
since it is a super-specific case (that we shouldn't be supporting anyways)
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]