[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search
vburenin commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760299104 > @vburenin Left a comment to restructure the code to support buffering, are you going to look into improving the O(m*n) search ? At this point of time I think it is not necessary as the search pattern is trivial and the actually complexity is closer to O(n), the slowest point in the original code is memory copies and additional overhead associated with it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search
vburenin commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-759798734 ``` LOG.info("Class Name: " + fsDataInputStream.getWrappedStream().getClass().getName()); ``` ``` 473840 [Executor task launch worker for task 267] INFO org.apache.hudi.common.table.log.HoodieLogFileReader - Class Name: org.apache.hadoop.fs.FSDataInputStream ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search
vburenin commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-759779766 In process of building and trying. The original search method is still O(m*n), which is also worth to optimize. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search
vburenin commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-759774305 According to this stacktrace it is not the case: ``` at com.google.cloud.hadoop.repackaged.gcs.com.google.api.services.storage.Storage$Objects$Get.executeMedia(Storage.java:6981) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.openStream(GoogleCloudStorageReadChannel.java:967) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.openContentChannel(GoogleCloudStorageReadChannel.java:772) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.performLazySeek(GoogleCloudStorageReadChannel.java:763) at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:365) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:131) - locked <0x000616319fb8> (a com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:149) at java.io.DataInputStream.readFully(DataInputStream.java:195) at org.apache.hudi.common.table.log.HoodieLogFileReader.hasNextMagic(HoodieLogFileReader.java:339) at org.apache.hudi.common.table.log.HoodieLogFileReader.scanForNextAvailableBlockOffset(HoodieLogFileReader.java:280) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search
vburenin commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-759766940 @vinothchandar I suspect the buffering of underlaying reads is on FileSystem driver, isn't it? If it is, GCS clearly not buffering it that can be seen in a form of a significant time distance (80 ms) between the calls to the readyFully method. ``` 404120 [Executor task launch worker for task 268] INFO org.apache.hudi.common.table.log.HoodieLogFileReader - Current magic position: 263 404183 [Executor task launch worker for task 268] INFO org.apache.hudi.common.table.log.HoodieLogFileReader - Current magic position: 264 404246 [Executor task launch worker for task 268] INFO org.apache.hudi.common.table.log.HoodieLogFileReader - Current magic position: 265 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org