taiyang-li commented on issue #4943: URL: https://github.com/apache/incubator-gluten/issues/4943#issuecomment-1993693855
原因:查询运行过程中,有26200次new byte[1024*1024] 操作,平均每个task有78次,总耗时8s, 而查询耗时也就30+s 问题:为什么会走带copy的OnHeapCopyShuffleInputStream,没走zero-copy的LowCopyNettyShuffleInputStream 调用链 ``` CHColumnarBatchSerializerInstance.deserializeStream CHStreamReader.CHStreamReader CHShuffleReadStreamFactory.create ``` ``` java public static ShuffleInputStream create( InputStream in, boolean forceCompress, boolean isCustomizedShuffleCodec) { final InputStream unwrapped = unwrapInputStream(in, forceCompress, isCustomizedShuffleCodec); if (unwrapped != null) { return createCompressedShuffleInputStream(in, unwrapped); } return new OnHeapCopyShuffleInputStream(in, false); } private static InputStream unwrapInputStream( InputStream in, boolean forceCompress, boolean isCustomizedShuffleCodec) { if (forceCompress) { return unwrapSparkInputStream(in); } else if (isCustomizedShuffleCodec) { return unwrapSparkWithCompressedInputStream(in); } return null; } ``` 由于我的local环境中并未设置celeborn作为shuffle manager, 因此最终走了OnHeapCopyShuffleInputStream。而OnHeapCopyShuffleInputStream目前的实现还不是很高效,最终导致了标题中描述的问题。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org