[ 
https://issues.apache.org/jira/browse/ARROW-12150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

abdel alfahham updated ARROW-12150:
-----------------------------------
    Description: 
Exporting pyarrow.table that contains mixed-precision Decimals using  
parquet.write_table creates a parquet that contains invalid data/values.

In the example below the first value of data_decimal is turned from 
Decimal('579.11999511718795474735088646411895751953125000000000') in the 
pyarrow table to 
Decimal('-378.68971792399258172661600550482428224218070136475136') in the 
parquet.

 
{code:java}
import pyarrow
 from decimal import Decimal
values_floats = [579.119995117188, 6.40999984741211, 2.0] # floats
 decs_from_values = [Decimal(v) for v in values_floats] # Decimal
 decs_from_float = [Decimal.from_float(v) for v in values_floats] # Decimal 
using from_float
 decs_str = [Decimal(str(v)) for v in values_floats] # Decimal
data_dict = {"data_decimal": decs_from_values, # python Decimal
 "data_decimal_from_float": decs_from_float, # python Decimal using from_float
 "data_float":values_floats, # python floats
 "data_dec_str": decs_str}
table = pyarrow.table(data=data_dict)
print(table.to_pydict()) # before saving
 pyarrow.parquet.write_table(table, "./pyarrow_decimal.parquet") # saving
 print(pyarrow.parquet.read_table("./pyarrow_decimal.parquet").to_pydict()) # 
after saving
{code}
 

  was:
Exporting pyarrow.table that contains mixed-precision Decimals using  
parquet.write_table creates a parquet that contains invalid data/values.

In the example below the first value of data_decimal is turned from 
Decimal('579.11999511718795474735088646411895751953125000000000') in the 
pyarrow table to 
Decimal('-378.68971792399258172661600550482428224218070136475136') in the 
parquet.

 
import pyarrow
from decimal import Decimal

values_floats = [579.119995117188, 6.40999984741211, 2.0] # floats
decs_from_values = [Decimal(v) for v in values_floats] # Decimal
decs_from_float = [Decimal.from_float(v) for v in values_floats] # Decimal 
using from_float
decs_str = [Decimal(str(v)) for v in values_floats] # Decimal 

data_dict = \{"data_decimal": decs_from_values, # python Decimal
             "data_decimal_from_float": decs_from_float, # python Decimal using 
from_float
             "data_float":values_floats, # python floats
             "data_dec_str": decs_str}

table = pyarrow.table(data=data_dict)

print(table.to_pydict()) # before saving
pyarrow.parquet.write_table(table, "./pyarrow_decimal.parquet") # saving
print(pyarrow.parquet.read_table("./pyarrow_decimal.parquet").to_pydict()) # 
after saving


> [Python] Invalid data when Decimal is exported to parquet 
> ----------------------------------------------------------
>
>                 Key: ARROW-12150
>                 URL: https://issues.apache.org/jira/browse/ARROW-12150
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 3.0.0
>         Environment: - macOS Big Sur 11.2.1
> - python 3.8.2
>            Reporter: abdel alfahham
>            Priority: Major
>
> Exporting pyarrow.table that contains mixed-precision Decimals using  
> parquet.write_table creates a parquet that contains invalid data/values.
> In the example below the first value of data_decimal is turned from 
> Decimal('579.11999511718795474735088646411895751953125000000000') in the 
> pyarrow table to 
> Decimal('-378.68971792399258172661600550482428224218070136475136') in the 
> parquet.
>  
> {code:java}
> import pyarrow
>  from decimal import Decimal
> values_floats = [579.119995117188, 6.40999984741211, 2.0] # floats
>  decs_from_values = [Decimal(v) for v in values_floats] # Decimal
>  decs_from_float = [Decimal.from_float(v) for v in values_floats] # Decimal 
> using from_float
>  decs_str = [Decimal(str(v)) for v in values_floats] # Decimal
> data_dict = {"data_decimal": decs_from_values, # python Decimal
>  "data_decimal_from_float": decs_from_float, # python Decimal using from_float
>  "data_float":values_floats, # python floats
>  "data_dec_str": decs_str}
> table = pyarrow.table(data=data_dict)
> print(table.to_pydict()) # before saving
>  pyarrow.parquet.write_table(table, "./pyarrow_decimal.parquet") # saving
>  print(pyarrow.parquet.read_table("./pyarrow_decimal.parquet").to_pydict()) # 
> after saving
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to