taiyang-li commented on issue #4943:
URL: 
https://github.com/apache/incubator-gluten/issues/4943#issuecomment-1993693855

   原因:查询运行过程中,有26200次new byte[1024*1024] 操作,平均每个task有78次,总耗时8s,  而查询耗时也就30+s
   
   
问题:为什么会走带copy的OnHeapCopyShuffleInputStream,没走zero-copy的LowCopyNettyShuffleInputStream
   
   调用链
   ```
   CHColumnarBatchSerializerInstance.deserializeStream
   CHStreamReader.CHStreamReader
   CHShuffleReadStreamFactory.create
   ```
   
   ``` java
   public static ShuffleInputStream create(
         InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
       final InputStream unwrapped = unwrapInputStream(in, forceCompress, 
isCustomizedShuffleCodec);
       if (unwrapped != null) {
         return createCompressedShuffleInputStream(in, unwrapped);
       }
       return new OnHeapCopyShuffleInputStream(in, false);
     }
   
     private static InputStream unwrapInputStream(
         InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
       if (forceCompress) {
         return unwrapSparkInputStream(in);
       } else if (isCustomizedShuffleCodec) {
         return unwrapSparkWithCompressedInputStream(in);
       }
       return null;
     }
   ``` 
   
   由于我的local环境中并未设置celeborn作为shuffle manager, 
因此最终走了OnHeapCopyShuffleInputStream。而OnHeapCopyShuffleInputStream目前的实现还不是很高效,最终导致了标题中描述的问题。
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org

Reply via email to