Wes McKinney created PARQUET-816:
------------------------------------
Summary: [C++] Failure decoding sample dict-encoded file from
parquet-compatibility project
Key: PARQUET-816
URL: https://issues.apache.org/jira/browse/PARQUET-816
Project: Parquet
Issue Type: Bug
Components: parquet-cpp
Reporter: Wes McKinney
Attachments: nation.dict.parquet
See attached. This throws an exception when read:
{code}
$ debug/parquet_reader ~/code/fastparquet/test-data/nation.dict.parquet
File statistics:
Version: 1
Created By: parquet-mr
Total rows: 25
Number of RowGroups: 1
Number of Real Columns: 4
Number of Columns: 4
Number of Selected Columns: 4
Column 0: nation_key (INT32)
Column 1: name (BYTE_ARRAY)
Column 2: region_key (INT32)
Column 3: comment_col (BYTE_ARRAY)
--- Row Group 0 ---
--- Total Bytes 0 ---
rows: 25---
Column 0
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 125, compressed size: 125
Column 1
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 322, compressed size: 322
Column 2
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 125, compressed size: 125
Column 3
, values: 25 Statistics Not Set
compression: UNCOMPRESSED, encodings:
uncompressed size: 2002, compressed size: 2002
nation_key name region_key
comment_col
0 Parquet error: Unexpected end of stream.
{code}
However, I checked that I can read this file with Impala:
{code}
In [13]: hdfs.put('/tmp/nation-dict-test/test.parq', 'nation.dict.parquet')
Out[13]: '/tmp/nation-dict-test/test.parq'
In [14]: pf = con.parquet_file('/tmp/nation-dict-test')
In [15]: pf.execute()
Out[15]:
nation_key name region_key \
0 0 ALGERIA 0
1 1 ARGENTINA 1
2 2 BRAZIL 1
3 3 CANADA 1
4 4 EGYPT 4
5 5 ETHIOPIA 0
6 6 FRANCE 3
7 7 GERMANY 3
8 8 INDIA 2
9 9 INDONESIA 2
10 10 IRAN 4
11 11 IRAQ 4
12 12 JAPAN 2
13 13 JORDAN 4
14 14 KENYA 0
15 15 MOROCCO 0
16 16 MOZAMBIQUE 0
17 17 PERU 1
18 18 CHINA 2
19 19 ROMANIA 3
20 20 SAUDI ARABIA 4
21 21 VIETNAM 2
22 22 RUSSIA 3
23 23 UNITED KINGDOM 3
24 24 UNITED STATES 1
comment_col
0 haggle. carefully final deposits detect slyly...
1 al foxes promise slyly according to the regula...
2 y alongside of the pending deposits. carefully...
3 eas hang ironic, silent packages. slyly regula...
4 y above the carefully unusual theodolites. fin...
5 ven packages wake quickly. regu
6 refully final requests. regular, ironi
7 l platelets. regular accounts x-ray: unusual, ...
8 ss excuses cajole slyly across the packages. d...
9 slyly express asymptotes. regular deposits ha...
10 efully alongside of the slyly final dependenci...
11 nic deposits boost atop the quickly final requ...
12 ously. final, express gifts cajole a
13 ic deposits are blithely about the carefully r...
14 pending excuses haggle furiously deposits. pe...
15 rns. blithely bold courts among the closely re...
16 s. ironic, unusual asymptotes wake blithely r
17 platelets. blithely pending dependencies use f...
18 c dependencies. furiously express notornis sle...
19 ular asymptotes are about the furious multipli...
20 ts. silent requests haggle. closely express pa...
21 hely enticingly express accounts. even, final
22 requests against the platelets use never acco...
23 eans boost carefully special requests. account...
24 y final packages. slow foxes cajole quickly. q...
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)