[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox


vburenin commented on pull request #2440:
URL: https://github.com/apache/hudi/pull/2440#issuecomment-760299104


   > @vburenin Left a comment to restructure the code to support buffering, are 
you going to look into improving the O(m*n) search ?
   At this point of time I think it is not necessary as the search pattern is 
trivial and the actually complexity is closer to O(n), the slowest point in the 
original code is memory copies and additional overhead associated with it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-13 Thread GitBox


vburenin commented on pull request #2440:
URL: https://github.com/apache/hudi/pull/2440#issuecomment-759798734


   ```
   LOG.info("Class Name: " + 
fsDataInputStream.getWrappedStream().getClass().getName());
   ```
   ```
   473840 [Executor task launch worker for task 267] INFO  
org.apache.hudi.common.table.log.HoodieLogFileReader  - Class Name: 
org.apache.hadoop.fs.FSDataInputStream
   ```
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-13 Thread GitBox


vburenin commented on pull request #2440:
URL: https://github.com/apache/hudi/pull/2440#issuecomment-759779766


   In process of building and trying.
   
   The original search method is still O(m*n), which is also worth to optimize.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-13 Thread GitBox


vburenin commented on pull request #2440:
URL: https://github.com/apache/hudi/pull/2440#issuecomment-759774305


   According to this stacktrace it is not the case:
   ```
   at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.services.storage.Storage$Objects$Get.executeMedia(Storage.java:6981)
at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.openStream(GoogleCloudStorageReadChannel.java:967)
at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.openContentChannel(GoogleCloudStorageReadChannel.java:772)
at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.performLazySeek(GoogleCloudStorageReadChannel.java:763)
at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadChannel.read(GoogleCloudStorageReadChannel.java:365)
at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.read(GoogleHadoopFSInputStream.java:131)
- locked <0x000616319fb8> (a 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at 
org.apache.hudi.common.table.log.HoodieLogFileReader.hasNextMagic(HoodieLogFileReader.java:339)
at 
org.apache.hudi.common.table.log.HoodieLogFileReader.scanForNextAvailableBlockOffset(HoodieLogFileReader.java:280)
```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-13 Thread GitBox


vburenin commented on pull request #2440:
URL: https://github.com/apache/hudi/pull/2440#issuecomment-759766940


   @vinothchandar I suspect the buffering of underlaying reads is on FileSystem 
driver, isn't it? If it is, GCS clearly not buffering it that can be seen in a 
form of a significant time distance (80 ms) between the calls to the readyFully 
method.
   ```
   404120 [Executor task launch worker for task 268] INFO  
org.apache.hudi.common.table.log.HoodieLogFileReader  - Current magic position: 
263
   404183 [Executor task launch worker for task 268] INFO  
org.apache.hudi.common.table.log.HoodieLogFileReader  - Current magic position: 
264
   404246 [Executor task launch worker for task 268] INFO  
org.apache.hudi.common.table.log.HoodieLogFileReader  - Current magic position: 
265
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org