[ https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677187#comment-17677187 ]
Pralabh Kumar commented on SPARK-36728: --------------------------------------- I think this issue is not reproducible on Spark 3.4. Please confirm > Can't create datetime object from anything other then year column Pyspark - > koalas > ---------------------------------------------------------------------------------- > > Key: SPARK-36728 > URL: https://issues.apache.org/jira/browse/SPARK-36728 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.3.0 > Reporter: Bjørn Jørgensen > Priority: Major > Attachments: pyspark_date.txt, pyspark_date2.txt > > > If I create a datetime object it must be from columns named year. > > df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, > 2016], 'month': [2, 3], 'day': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df.info() > <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data > columns (total 6 columns): # Column Non-Null Count Dtype--- ------ > -------------- ----- 0 year 2 non-null int64 1 month 2 > non-null int64 2 day 2 non-null int64 3 hour 2 non-null > int64 4 minute 2 non-null int64 5 second 2 non-null > int64dtypes: int64(6) > df['date'] = ps.to_datetime(df[['year', 'month', 'day']]) > df.info() > <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data > columns (total 7 columns): # Column Non-Null Count Dtype --- ------ > -------------- ----- 0 year 2 non-null int64 1 month > 2 non-null int64 2 day 2 non-null int64 3 hour > 2 non-null int64 4 minute 2 non-null int64 5 second > 2 non-null int64 6 date 2 non-null datetime64dtypes: > datetime64(1), int64(6) > df_test = ps.DataFrame(\{'testyear': [2015, 2016], > 'testmonth': [2, 3], 'testday': [4, 5], > 'hour': [2, 3], 'minute': [10, 30], > 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', > 'testmonth', 'testday']]) > ---------------------------------------------------------------------------KeyError > Traceback (most recent call > last)/tmp/ipykernel_73/904491906.py in <module>----> 1 df_test['date'] = > ps.to_datetime(df[['testyear', 'testmonth', 'testday']]) > /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853 > return self.loc[:, key] 11854 elif is_list_like(key):> > 11855 return self.loc[:, list(key)] 11856 raise > NotImplementedError(key) 11857 > /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476 > returns_series, 477 series_name,--> 478 > ) = self._select_cols(cols_sel) 479 480 if cond > is None and limit is None and returns_series: > /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, > missing_keys) 322 return self._select_cols_else(cols_sel, > missing_keys) 323 elif is_list_like(cols_sel):--> 324 > return self._select_cols_by_iterable(cols_sel, missing_keys) 325 > else: 326 return self._select_cols_else(cols_sel, missing_keys) > /opt/spark/python/pyspark/pandas/indexing.py in > _select_cols_by_iterable(self, cols_sel, missing_keys) 1352 > if not found: 1353 if missing_keys is None:-> 1354 > raise KeyError("['{}'] not in > index".format(name_like_string(key))) 1355 else: 1356 > missing_keys.append(key) > KeyError: "['testyear'] not in index" > df_test > testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 > 30 25 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org