Re: Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

2018-09-26 Thread Sean Owen
Is this not just a case of floating-point literals not being exact? this is
expressed in Python, not SQL.

On Wed, Sep 26, 2018 at 12:46 AM Meethu Mathew 
wrote:

> Hi all,
>
> I tried the following code and the output was not as expected.
>
> schema = StructType([StructField('Id', StringType(), False),
>>  StructField('Value', FloatType(), False)])
>> df_test =
>> spark.createDataFrame([('a',5.0),('b',1.236),('c',-0.31)],schema)
>
> df_test
>
>
> Output :  DataFrame[Id: string, Value: float]
> [image: image.png]
> But when the value is given as a string, it worked.
>
> [image: image.png]
> Again tried with a floating point number with one decimal place and it
> worked.
> [image: image.png]
> And when the equals operation is changed to greater than or less than, its
> working with more than one decimal place numbers
> [image: image.png]
> Is this a bug?
>
> Regards,
> Meethu Mathew
>
>
>


Re: Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

2018-09-26 Thread Sandeep Katta
I think it is similar to the one SPARK-25452

Regards
Sandeep Katta

On Wed, 26 Sep 2018 at 11:16 AM, Meethu Mathew 
wrote:

> Hi all,
>
> I tried the following code and the output was not as expected.
>
> schema = StructType([StructField('Id', StringType(), False),
>>  StructField('Value', FloatType(), False)])
>> df_test =
>> spark.createDataFrame([('a',5.0),('b',1.236),('c',-0.31)],schema)
>
> df_test
>
>
> Output :  DataFrame[Id: string, Value: float]
> [image: image.png]
> But when the value is given as a string, it worked.
>
> [image: image.png]
> Again tried with a floating point number with one decimal place and it
> worked.
> [image: image.png]
> And when the equals operation is changed to greater than or less than, its
> working with more than one decimal place numbers
> [image: image.png]
> Is this a bug?
>
> Regards,
>
>
> Meethu Mathew
>
>
>


Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

2018-09-25 Thread Meethu Mathew
Hi all,

I tried the following code and the output was not as expected.

schema = StructType([StructField('Id', StringType(), False),
>  StructField('Value', FloatType(), False)])
> df_test = spark.createDataFrame([('a',5.0),('b',1.236),('c',-0.31)],schema)

df_test


Output :  DataFrame[Id: string, Value: float]
[image: image.png]
But when the value is given as a string, it worked.

[image: image.png]
Again tried with a floating point number with one decimal place and it
worked.
[image: image.png]
And when the equals operation is changed to greater than or less than, its
working with more than one decimal place numbers
[image: image.png]
Is this a bug?

Regards,
Meethu Mathew