[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

Pralabh Kumar (Jira) Sun, 15 Jan 2023 22:59:03 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677187#comment-17677187
 ]


Pralabh Kumar commented on SPARK-36728:
---------------------------------------

I think this issue is not reproducible on Spark 3.4. Please confirm 

> Can't create datetime object from anything other then year column Pyspark - 
> koalas
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-36728
>                 URL: https://issues.apache.org/jira/browse/SPARK-36728
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: Bjørn Jørgensen
>            Priority: Major
>         Attachments: pyspark_date.txt, pyspark_date2.txt
>
>
> If I create a datetime object it must be from columns named year.
>  
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 
> 2016],                   'month': [2, 3],                    'day': [4, 5],   
>                  'hour': [2, 3],                    'minute': [10, 30],       
>              'second': [21,25]}) df.info()
> <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data 
> columns (total 6 columns): #   Column  Non-Null Count  Dtype---  ------  
> --------------  ----- 0   year    2 non-null      int64 1   month   2 
> non-null      int64 2   day     2 non-null      int64 3   hour    2 non-null  
>     int64 4   minute  2 non-null      int64 5   second  2 non-null      
> int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data 
> columns (total 7 columns): #   Column  Non-Null Count  Dtype     ---  ------  
> --------------  -----      0   year    2 non-null      int64      1   month   
> 2 non-null      int64      2   day     2 non-null      int64      3   hour    
> 2 non-null      int64      4   minute  2 non-null      int64      5   second  
> 2 non-null      int64      6   date    2 non-null      datetime64dtypes: 
> datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],                   
> 'testmonth': [2, 3],                    'testday': [4, 5],                    
> 'hour': [2, 3],                    'minute': [10, 30],                    
> 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 
> 'testmonth', 'testday']])
> ---------------------------------------------------------------------------KeyError
>                                   Traceback (most recent call 
> last)/tmp/ipykernel_73/904491906.py in <module>----> 1 df_test['date'] = 
> ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key)  11853    
>          return self.loc[:, key]  11854         elif is_list_like(key):> 
> 11855             return self.loc[:, list(key)]  11856         raise 
> NotImplementedError(key)  11857 
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key)    476 
>                 returns_series,    477                 series_name,--> 478    
>          ) = self._select_cols(cols_sel)    479     480             if cond 
> is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, 
> missing_keys)    322             return self._select_cols_else(cols_sel, 
> missing_keys)    323         elif is_list_like(cols_sel):--> 324             
> return self._select_cols_by_iterable(cols_sel, missing_keys)    325         
> else:    326             return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in 
> _select_cols_by_iterable(self, cols_sel, missing_keys)   1352                 
> if not found:   1353                     if missing_keys is None:-> 1354      
>                    raise KeyError("['{}'] not in 
> index".format(name_like_string(key)))   1355                     else:   1356 
>                         missing_keys.append(key)
> KeyError: "['testyear'] not in index"
> df_test
> testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 
> 30 25



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

Reply via email to