[ 
https://issues.apache.org/jira/browse/AVRO-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801995#action_12801995
 ] 

Philip Zeyliger commented on AVRO-135:
--------------------------------------

I spent some time thinking about an interface for Codec and want to digest a 
bit longer.

DataFileWriter gets a DatumWriter and a Datum and then it uses 
DatumWriter.write(D Datum, Encoder) to write the value.  In turn, Encoder 
(which is a BinaryEncoder in this case), writes to an OutputStream.  The 
current approach is to encode into a ByteBufferOutputStream, and, when that 
reaches a certain size, copy it into the final output.  I'm trying to figure 
out where a Codec interface fits in here.  It could:
* Pretend to be an OutputStream, i.e., be used when constructing 
BinaryEncoder(), and offer an uncompressedSize() method, as well as a writeTo() 
method.  So, a replacement for ByteBufferOutputStream.
* Pretend to be an Encoder.  The advantage here is that you could build a 
compression scheme that was schema-aware (e.g., semi-columnar or PAX-like), 
without re-parsing the data.

I'm leaning towards the former right now.  What do you mean by compress(byte[], 
int, int, Encoder) above?

> add compression to data files
> -----------------------------
>
>                 Key: AVRO-135
>                 URL: https://issues.apache.org/jira/browse/AVRO-135
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>            Assignee: Philip Zeyliger
>            Priority: Blocker
>             Fix For: 1.3.0
>
>         Attachments: AVRO-135.patch.txt
>
>
> We should add support for at least one compression codec to data files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to