shanjixi commented on issue #36608: URL: https://github.com/apache/arrow/issues/36608#issuecomment-1635232682
> When you talk about blocks are you talking about the snappy framing format? https://github.com/google/snappy/blob/main/framing_format.txt NOT CSV files. We are reading some kind of TFRecod file which is commpressed into Hadoop-Snappy format such like training_instance_0001.tfrecord.snappy There are 3 kind of general snappy-compressed file (not the parquet use snappy internally) , and this issue is about "hadoop-snappy file" 1.snappy whole file 2.snappy framing (google, rarely used.) 3.hadoop-snappy file (whichi is whide useed in big-data ecosystem) **2.** is different from **3.** we can treat **3.** as a kind of general blocked based compressed format( the codec for the blocks could able replaced by ZSTD, gzip and so on;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
