Sushant Pandey created GOBBLIN-383:
--------------------------------------

             Summary: Compaction job's output is not compressed
                 Key: GOBBLIN-383
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-383
             Project: Apache Gobblin
          Issue Type: Bug
          Components: gobblin-compaction
    Affects Versions: 0.11.0
            Reporter: Sushant Pandey
            Assignee: Issac Buenrostro
         Attachments: mr_compact.txt

Output of compaction job on snappy compressed avro files is not compressed, in 
effect size of output file is considerably more than the sum of sizes of input 
files. Compaction job is running with following parameters:

{color:#333333}{{fs.uri=hdfs://hdp-ubuntu-hadoop-mgr-1:8020}}
{{writer.fs.uri=${fs.uri}}}{{job.name=CompactKafkaMR}}
{{job.group=PNDA}}{{mr.job.max.mappers=5}}{{compaction.datasets.finder=gobblin.compaction.dataset.TimeBasedSubDirDatasetsFinder}}
{{compaction.input.dir=/user/pnda/PNDA_datasets/datasets}}
{{compaction.dest.dir=/user/pnda/PNDA_datasets/compacted8}}
{{compaction.input.subdir=.}}
{{compaction.dest.subdir=.}}
{{compaction.timebased.folder.pattern='year='YYYY/'month='MM/'day='dd/'hour='HH}}
{{compaction.timebased.max.time.ago=10d}}
{{compaction.timebased.min.time.ago=1h}}
{{compaction.input.deduplicated=true}}
{{compaction.output.deduplicated=true}}
{{compaction.jobprops.creator.class=gobblin.compaction.mapreduce.MRCompactorTimeBasedJobPropCreator}}
{{compaction.job.runner.class=gobblin.compaction.mapreduce.avro.MRCompactorAvroKeyDedupJobRunner}}
{{compaction.timezone=UTC}}
{{compaction.job.overwrite.output.dir=true}}
{{compaction.recompact.from.input.for.late.data=true}}{color}{{}}

 

Tried following configuration options with no success:

{{mapreduce.output.fileoutputformat.compress=true}}
{{mapreduce.output.fileoutputformat.compress.codec=hadoop.io.compress.SnappyCodec}}
{{mapreduce.output.fileoutputformat.compress.type=RECORD}}

{{writer.codec.type=SNAPPY}}
{{writer.builder.class=gobblin.writer.AvroDataWriterBuilder}}


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to