jorisvandenbossche commented on a change in pull request #11724: URL: https://github.com/apache/arrow/pull/11724#discussion_r759490761
########## File path: python/pyarrow/_parquet.pyx ########## @@ -880,6 +880,21 @@ cdef encoding_name_from_enum(ParquetEncoding encoding_): }.get(encoding_, 'UNKNOWN') +cdef encoding_enum_from_name(str encoding_name): + enc = { + 'PLAIN': ParquetEncoding_PLAIN, + 'BIT_PACKED': ParquetEncoding_BIT_PACKED, + 'RLE': ParquetEncoding_RLE, + 'BYTE_STREAM_SPLIT': ParquetEncoding_BYTE_STREAM_SPLIT, + 'DELTA_BINARY_PACKED': ParquetEncoding_DELTA_BINARY_PACKED, + 'DELTA_BYTE_ARRAY': ParquetEncoding_DELTA_BYTE_ARRAY, + }.get(encoding_name, None) + if enc is None: + raise ValueError(f"Unsupported column encoding: {encoding_name!r}") Review comment: IIRC the "BIT_PACKED" encoding is also only supported for repetition/definition levels, while the encoding specified here is for the actual values. So we could also not include it in the list below and thus give the "Unsupported column encoding" message from the snippet above. By leaving it in here, we bubble up the error message that the C++ parquet code will raise (from https://github.com/apache/arrow/blob/2baed02d259fe3534d20de14af6f7df48611b858/cpp/src/parquet/encoding.cc#L2643-L2705) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org