jorisvandenbossche commented on a change in pull request #11724:
URL: https://github.com/apache/arrow/pull/11724#discussion_r759490761



##########
File path: python/pyarrow/_parquet.pyx
##########
@@ -880,6 +880,21 @@ cdef encoding_name_from_enum(ParquetEncoding encoding_):
     }.get(encoding_, 'UNKNOWN')
 
 
+cdef encoding_enum_from_name(str encoding_name):
+    enc = {
+        'PLAIN': ParquetEncoding_PLAIN,
+        'BIT_PACKED': ParquetEncoding_BIT_PACKED,
+        'RLE': ParquetEncoding_RLE,
+        'BYTE_STREAM_SPLIT': ParquetEncoding_BYTE_STREAM_SPLIT,
+        'DELTA_BINARY_PACKED': ParquetEncoding_DELTA_BINARY_PACKED,
+        'DELTA_BYTE_ARRAY': ParquetEncoding_DELTA_BYTE_ARRAY,
+    }.get(encoding_name, None)
+    if enc is None:
+        raise ValueError(f"Unsupported column encoding: {encoding_name!r}")

Review comment:
       IIRC the "BIT_PACKED" encoding is also only supported for 
repetition/definition levels, while the encoding specified here is for the 
actual values. 
   So we could also not include it in the list below and thus give the 
"Unsupported column encoding" message from the snippet above. By leaving it in 
here, we bubble up the error message that the C++ parquet code will raise (from 
https://github.com/apache/arrow/blob/2baed02d259fe3534d20de14af6f7df48611b858/cpp/src/parquet/encoding.cc#L2643-L2705)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to