shanjixi commented on issue #36608:
URL: https://github.com/apache/arrow/issues/36608#issuecomment-1635232682

   > When you talk about blocks are you talking about the snappy framing 
format? https://github.com/google/snappy/blob/main/framing_format.txt
   
   NOT CSV files.  We are reading some kind of TFRecod file which is 
commpressed into Hadoop-Snappy format such like 
training_instance_0001.tfrecord.snappy 
   
   There are 3 kind of general snappy-compressed file (not the parquet use 
snappy internally) , and this issue is about "hadoop-snappy file"
   
   1.snappy whole file
   2.snappy framing  (google, rarely used.)
   3.hadoop-snappy file  (whichi is whide useed in big-data ecosystem)
   
   **2.** is different from **3.** we can treat **3.** as a kind of general 
blocked based compressed format( the codec for the blocks could able replaced 
by ZSTD, gzip and so on;)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to