[jira] [Created] (SPARK-40934) pyspark.pandas.read_csv parses dates, but docs state otherwise

Stefaan Lippens (Jira) Thu, 27 Oct 2022 03:46:23 -0700

Stefaan Lippens created SPARK-40934:
---------------------------------------


             Summary: pyspark.pandas.read_csv parses dates, but docs state 
otherwise
                 Key: SPARK-40934
                 URL: https://issues.apache.org/jira/browse/SPARK-40934
             Project: Spark
          Issue Type: Bug
          Components: Pandas API on Spark
    Affects Versions: 3.3.1
            Reporter: Stefaan Lippens


from 
[https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.read_csv.html]
 :
{quote}parse_dates:
boolean or list of ints or names or list of lists or dict, default False.
Currently only False is allowed.
{quote}
This documentation suggests that dates are never parsed, but apparently they 
are always parsed (and it can not be disabled):
{code:python}
import pyspark.pandas
df = pyspark.pandas.read_csv("data.csv", parse_dates=False)
print(df)
print(df.dtypes)
{code}
with this data
{code:java}
date,feature_index,band_0,band_1,band_2
2021-01-05T01:00:00.000+01:00,2,5.0,4.5,3.75
2021-01-05T01:00:00.000+01:00,0,5.0,1.0,2.25
2021-01-05T01:00:00.000+01:00,1,5.0,3.5,4.0
2021-01-15T01:00:00.000+01:00,2,15.0,4.5,3.75
2021-01-15T01:00:00.000+01:00,0,15.0,1.0,2.25
{code}
gives
{code:java}
                 date  feature_index  band_0  band_1  band_2
0 2021-01-05 01:00:00              2     5.0     4.5    3.75
1 2021-01-05 01:00:00              0     5.0     1.0    2.25
2 2021-01-05 01:00:00              1     5.0     3.5    4.00
3 2021-01-15 01:00:00              2    15.0     4.5    3.75
4 2021-01-15 01:00:00              0    15.0     1.0    2.25
date             datetime64[ns]
feature_index             int32
band_0                  float64
band_1                  float64
band_2                  float64
dtype: object
{code}
Notice how the dates are parsed (e.g.  dtype {{datetime64[ns]}} for {{date}})



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40934) pyspark.pandas.read_csv parses dates, but docs state otherwise

Reply via email to