Christopher Aycock created ARROW-375: ----------------------------------------
Summary: columns parameter in parquet.read_table() raises KeyError for valid column Key: ARROW-375 URL: https://issues.apache.org/jira/browse/ARROW-375 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Christopher Aycock Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have {code:none} In [1]: from pyarrow import parquet In [2]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet') In [3]: t.to_pandas() Out[3]: age name 0 1 A 1 2 B 2 3 C In [4]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age']) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-4-5cf213819489> in <module>() ----> 1 t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age']) /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.read_table (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)() 143 return reader.read_all() 144 else: --> 145 column_idxs = [reader.column_name_idx(column) for column in columns] 146 arrays = [reader.read_column(column_idx) for column_idx in column_idxs] 147 return Table.from_arrays(columns, arrays) /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.ParquetReader.column_name_idx (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)() 102 self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())] = i 103 --> 104 return self.column_idx_map[column_name] 105 106 def read_column(self, int column_index): KeyError: 'age' {code} This happens on both Mac and Linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)