Re: [I] [CH] New byte buffer takes most of time in SourceFromJavalter::generate [incubator-gluten]

2024-03-18 Thread via GitHub


baibaichen closed issue #4943: [CH] New byte buffer takes most of time in 
SourceFromJavalter::generate 
URL: https://github.com/apache/incubator-gluten/issues/4943


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [I] [CH] New byte buffer takes most of time in SourceFromJavalter::generate [incubator-gluten]

2024-03-18 Thread via GitHub


zzcclp commented on issue #4943:
URL: 
https://github.com/apache/incubator-gluten/issues/4943#issuecomment-2003077901

   > 原因:查询运行过程中,有26200次new byte[1024*1024] 操作,平均每个task有78次,总耗时8s, 而查询耗时也就30+s
   > 
   > 
问题:为什么会走带copy的OnHeapCopyShuffleInputStream,没走zero-copy的LowCopyNettyShuffleInputStream
   > 
   > 调用链
   > 
   > ```
   > CHColumnarBatchSerializerInstance.deserializeStream
   > CHStreamReader.CHStreamReader
   > CHShuffleReadStreamFactory.create
   > ```
   > 
   > ```java
   > public static ShuffleInputStream create(
   >   InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
   > final InputStream unwrapped = unwrapInputStream(in, forceCompress, 
isCustomizedShuffleCodec);
   > if (unwrapped != null) {
   >   return createCompressedShuffleInputStream(in, unwrapped);
   > }
   > return new OnHeapCopyShuffleInputStream(in, false);
   >   }
   > 
   >   private static InputStream unwrapInputStream(
   >   InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
   > if (forceCompress) {
   >   return unwrapSparkInputStream(in);
   > } else if (isCustomizedShuffleCodec) {
   >   return unwrapSparkWithCompressedInputStream(in);
   > }
   > return null;
   >   }
   > ```
   > 
   > 由于我的local环境中并未设置celeborn作为shuffle manager, 
因此最终走了OnHeapCopyShuffleInputStream。而OnHeapCopyShuffleInputStream目前的实现还不是很高效,最终导致了标题中描述的问题。
   
   这里可能要看下你本地调用连,理应要走  LowCopyFileSegmentShuffleInputStream 
这个,因为是从本地文件直接读取,按理走这里。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [I] [CH] New byte buffer takes most of time in SourceFromJavalter::generate [incubator-gluten]

2024-03-13 Thread via GitHub


taiyang-li closed issue #4943: [CH] New byte buffer takes most of time in 
SourceFromJavalter::generate 
URL: https://github.com/apache/incubator-gluten/issues/4943


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [I] [CH] New byte buffer takes most of time in SourceFromJavalter::generate [incubator-gluten]

2024-03-13 Thread via GitHub


taiyang-li commented on issue #4943:
URL: 
https://github.com/apache/incubator-gluten/issues/4943#issuecomment-1993717645

   修改配置后 " --conf 
spark.shuffle.manager=org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager"
 
   火焰图如下:
   
![image](https://github.com/apache/incubator-gluten/assets/8181003/1b2c0fba-e12e-4eef-8922-4230be948c22)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [I] [CH] New byte buffer takes most of time in SourceFromJavalter::generate [incubator-gluten]

2024-03-13 Thread via GitHub


taiyang-li commented on issue #4943:
URL: 
https://github.com/apache/incubator-gluten/issues/4943#issuecomment-1993693855

   原因:查询运行过程中,有26200次new byte[1024*1024] 操作,平均每个task有78次,总耗时8s,  而查询耗时也就30+s
   
   
问题:为什么会走带copy的OnHeapCopyShuffleInputStream,没走zero-copy的LowCopyNettyShuffleInputStream
   
   调用链
   ```
   CHColumnarBatchSerializerInstance.deserializeStream
   CHStreamReader.CHStreamReader
   CHShuffleReadStreamFactory.create
   ```
   
   ``` java
   public static ShuffleInputStream create(
 InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
   final InputStream unwrapped = unwrapInputStream(in, forceCompress, 
isCustomizedShuffleCodec);
   if (unwrapped != null) {
 return createCompressedShuffleInputStream(in, unwrapped);
   }
   return new OnHeapCopyShuffleInputStream(in, false);
 }
   
 private static InputStream unwrapInputStream(
 InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
   if (forceCompress) {
 return unwrapSparkInputStream(in);
   } else if (isCustomizedShuffleCodec) {
 return unwrapSparkWithCompressedInputStream(in);
   }
   return null;
 }
   ``` 
   
   由于我的local环境中并未设置celeborn作为shuffle manager, 
因此最终走了OnHeapCopyShuffleInputStream。而OnHeapCopyShuffleInputStream目前的实现还不是很高效,最终导致了标题中描述的问题。
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



Re: [I] [CH] New byte buffer takes most of time in SourceFromJavalter::generate [incubator-gluten]

2024-03-12 Thread via GitHub


zhanglistar commented on issue #4943:
URL: 
https://github.com/apache/incubator-gluten/issues/4943#issuecomment-1993376883

   
optoruntime::new_array_c可能是传入的`memory.m_capacity`过大,另外jdk中会对内存进行memset,导致该函数占用过多的时间。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



[I] [CH] New byte buffer takes most of time in SourceFromJavalter::generate [incubator-gluten]

2024-03-12 Thread via GitHub


taiyang-li opened a new issue, #4943:
URL: https://github.com/apache/incubator-gluten/issues/4943

   ### Description
   
   
![d722f3fabeb6881fe8b49f58cf0eb6c](https://github.com/apache/incubator-gluten/assets/8181003/8244ef97-fd00-4838-a341-adcb669847ec)
  
   
   
   ```
   
   bool ReadBufferFromJavaInputStream::nextImpl()
   {
   int count = readFromJava();
   if (count > 0)
   working_buffer.resize(count);
   return count > 0;
   }
   int ReadBufferFromJavaInputStream::readFromJava() const
   {
   GET_JNIENV(env)
   jint count = safeCallIntMethod(
   env, java_in, ShuffleReader::input_stream_read, 
reinterpret_cast(working_buffer.begin()), memory.m_capacity);
   CLEAN_JNIENV
   return count;
   }
   ```
   
   ```
   @Override
 public long read(long destAddress, long maxReadSize) {
   return GlutenException.wrap(
   () -> {
 int maxReadSize32 = Math.toIntExact(maxReadSize);
 if (buffer == null || maxReadSize32 > buffer.length) {
   this.buffer = new byte[maxReadSize32];
 }
 // The code conducts copy as long as 'in' wraps off-heap data,
 // which is about to be moved to heap
 int read = in.read(buffer, 0, maxReadSize32);
 if (read == -1 || read == 0) {
   return 0;
 }
 // The code conducts copy, from heap to off-heap
 // memCopyFromHeap(buffer, destAddress, read);
 PlatformDependent.copyMemory(buffer, 0, destAddress, read);
 bytesRead += read;
 return read;
   });
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org