[ https://issues.apache.org/jira/browse/HADOOP-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated HADOOP-2406: ---------------------------------- Status: Patch Available (was: Open) > Micro-benchmark to measure read/write times through InputFormats > ---------------------------------------------------------------- > > Key: HADOOP-2406 > URL: https://issues.apache.org/jira/browse/HADOOP-2406 > Project: Hadoop > Issue Type: Test > Components: fs, test > Reporter: Chris Douglas > Assignee: Chris Douglas > Fix For: 0.16.0 > > Attachments: 2406-0.patch, 2406-1.patch > > > The attached test writes/reads XGB to/from the default filesystem through > SequenceFileInputFormat and TextInputFormat, using LzoCodec, GzipCodec, and > without compression, using both block and record compression for > SequenceFiles. > The following results using 10GB of data through RawLocalFileSystem with 5 > word keys, 20 word values (as generated by RandomTextWriter with the same > seed for each file) are pretty stable: > Writes: > || Format || Compression || Type || Time (sec) || Filesize (bytes) || > | SEQ | LZO | BLOCK | 318 | 8 604 288 397 | > | SEQ | LZO | RECORD | 367 | 11 689 969 413 | > | SEQ | ZIP | BLOCK | 929 | 2 827 697 769 | > | SEQ | ZIP | RECORD | 1737 | 9 324 730 365 | > | SEQ | | | 201 | 11 282 745 683 | > | TXT | LZO | | 742 | 12 671 065 769 | > | TXT | ZIP | | 1320 | 2 597 397 680 | > | TXT | | | 392 | 10 818 058 643 | > Reads: > || Format || Compression || Type || Time (sec) || > | SEQ | LZO | BLOCK | 150 | > | SEQ | LZO | RECORD | 281 | > | SEQ | ZIP | BLOCK | 155 | > | SEQ | ZIP | RECORD | 548 | > | SEQ | | | 209 | > | TXT | LZO | | 620 | > | TXT | ZIP | | 355 | > | TXT | | | 284 | > Of note: > - Lzo compressed TextOutput is larger than the uncompressed output > (HADOOP-2402); lzop cannot read it. > - Zip compression is expensive. Short values are responsible for the > unimpressive compression for record-compressed SequenceFiles. > - TextInputFormat is slow (HADOOP-2285). TextOutputFormat also looks suspect. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.