steveloughran commented on PR #3452:
URL: https://github.com/apache/parquet-java/pull/3452#issuecomment-4299734733

   latest numbers
   
   before adding uuid read (but with that long column)
   ```
   Benchmark                                           (tableType)  Mode  Cnt   
  Score    Error  Units
   VariantProjectionBenchmark.readAllRecords            Unshredded    ss   10  
1645.855 ± 27.618  ms/op
   VariantProjectionBenchmark.readAllRecords              Shredded    ss   10  
2381.192 ± 41.940  ms/op
   VariantProjectionBenchmark.readProjectedFileSchema   Unshredded    ss   10   
932.050 ± 37.143  ms/op
   VariantProjectionBenchmark.readProjectedFileSchema     Shredded    ss   10  
1596.800 ± 50.421  ms/op
   VariantProjectionBenchmark.readProjectedLeanSchema   Unshredded    ss   10  
1750.998 ± 10.982  ms/op
   VariantProjectionBenchmark.readProjectedLeanSchema     Shredded    ss   10   
724.603 ± 18.377  ms/op
   ```
   after adding UUID variant field.
   ```
   Benchmark                                           (tableType)  Mode  Cnt   
  Score     Error  Units
   VariantProjectionBenchmark.readAllRecords            Unshredded    ss   10  
1913.765 ±  18.896  ms/op
   VariantProjectionBenchmark.readAllRecords              Shredded    ss   10  
2679.631 ± 150.978  ms/op
   VariantProjectionBenchmark.readProjectedFileSchema   Unshredded    ss   10   
910.009 ±  33.074  ms/op
   VariantProjectionBenchmark.readProjectedFileSchema     Shredded    ss   10  
1616.585 ±  57.818  ms/op
   VariantProjectionBenchmark.readProjectedLeanSchema   Unshredded    ss   10  
1777.288 ±  14.049  ms/op
   VariantProjectionBenchmark.readProjectedLeanSchema     Shredded    ss   10   
723.679 ±   8.473  ms/op
   
   ```
   
   points to note
   * full record read is now ~ 15% slower on both unshredded and shredded files 
just by adding a UUID. Surprisingly Expensive.
   * Eeading all records on a shredded file is still ~40% slower than on an 
unshredded one.
   * same odd behaviours on a projected schema


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to