[ https://issues.apache.org/jira/browse/BEAM-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marian Dvorsky updated BEAM-3874: --------------------------------- Description: AvroIO currently uses CodecFactory.deflateCodec(6) as the default codec for writes. That compresses well, but is quite expensive. Snappy codec offers sparser, but much faster compression, and is typically a better CPU/storage tradeoff except for very long lived files. We should consider switching the default to Snappy. was: AvroIO currently uses [CodecFactory|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%2523b8636ed8a0357a3a3806fb8ad152a1e38d3b4fa39a6a66d189c040aee9687823&gsn=CodecFactory&ct=xref_usages].[deflateCodec|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%25239fc62def2276bb77cc0f71b21660540e246046da139bfed9b0f33c7f8dbb4550&gsn=deflateCodec&ct=xref_usages](6) as the default codec for writes. That compresses well, but is quite expensive. Snappy codec offers sparser, but much faster compression, and is typically a better CPU/storage tradeoff except for very long lived files. We should consider switching the default to Snappy. > Switch AvroIO sink default codec to Snappy > ------------------------------------------ > > Key: BEAM-3874 > URL: https://issues.apache.org/jira/browse/BEAM-3874 > Project: Beam > Issue Type: Improvement > Components: io-java-avro > Reporter: Marian Dvorsky > Assignee: Eugene Kirpichov > Priority: Minor > > AvroIO currently uses CodecFactory.deflateCodec(6) as the default codec for > writes. > That compresses well, but is quite expensive. > Snappy codec offers sparser, but much faster compression, and is typically a > better CPU/storage tradeoff except for very long lived files. > We should consider switching the default to Snappy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)