Yicong-Huang commented on code in PR #5149:
URL: https://github.com/apache/texera/pull/5149#discussion_r3292312007
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/result/iceberg/IcebergDocument.scala:
##########
@@ -125,7 +125,9 @@ private[storage] class IcebergDocument[T >: Null <: AnyRef](
.getOrElse(
return 0
)
- table.newScan().planFiles().iterator().asScala.map(f =>
f.file().recordCount()).sum
+ Using.resource(table.newScan().planFiles()) { tasks =>
Review Comment:
sorry to be the blocker here, but materializing an iterator != consuming an
iterator.
for instance, `sum(range(1000000))` (consume) is cheap because it only keeps
one int. but `list(range(1000000))` (materialize) would leave 1000000 ints in
mem.
so I still suggest we rethink about materializing the iterator.
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/result/iceberg/IcebergDocument.scala:
##########
@@ -125,7 +125,9 @@ private[storage] class IcebergDocument[T >: Null <: AnyRef](
.getOrElse(
return 0
)
- table.newScan().planFiles().iterator().asScala.map(f =>
f.file().recordCount()).sum
+ Using.resource(table.newScan().planFiles()) { tasks =>
Review Comment:
sorry to be the blocker here, but materializing an iterator != consuming an
iterator.
for instance, `sum(range(1000000))` (consume) is cheap because it only keeps
one int. but `list(range(1000000))` (materialize) would leave 1000000 ints in
memory.
so I still suggest we rethink about materializing the iterator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]