Strange error with hive null data
I have some Hive data loaded with sqoop that looks like this: 0: jdbc:drill:> select curr_fiscal_week, load_dts from hive.dim.`YEAR_DATE_DIM`; +---++ | curr_fiscal_week |load_dts| +---++ | 12.0 | 2014-02-07 15:11:35.0 | | 12.0 | 2014-02-07 15:11:35.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2015-01-13 00:00:00.0 | | 12.0 | 2015-01-13 00:00:00.0 | | 12.0 | null | | 12.0 | null | +---++ 26 rows selected (0.229 seconds) When I’m trying to create a drill parquet table by using a CTAS with a CASE statement to handle the nulls I get an error: 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`, . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`, . . . . . . . > CASE . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP) . . . . . . . > END as `FISCAL_YR_BEG_DT`, . . . . . . . > CASE . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE CAST(`FISCAL_YR_END_DT` as TIMESTAMP) . . . . . . . > END as `FISCAL_YR_END_DT`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_YR` as INT) . . . . . . . > END as `CURR_FISCAL_YR`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_HALF` as INT) . . . . . . . > END as `CURR_FISCAL_HALF`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_QTR` as INT) . . . . . . . > END as `CURR_FISCAL_QTR`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_MO` as INT) . . . . . . . > END as `CURR_FISCAL_MO`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_WEEK` as INT) . . . . . . . > END as `CURR_FISCAL_WEEK`, . . . . . . . > CASE . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE CAST(`LOAD_DTS` as TIMESTAMP) . . . . . . . > END as `LOAD_DTS` . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`; Error: SYSTEM ERROR: NumberFormatException: For input string: "null" Fragment 0:0 [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010] (state=,code=0) Simplified query: Using a CASE statement only on the field containing the null values works: 0: jdbc:drill:> select cast(curr_fiscal_week as int) as curr_fiscal_week, case when load_dts = 'null' then null else cast(load_dts as timestamp) end as `load_dts` from hive.dim.`YEAR_DATE_DIM`; +---++ | curr_fiscal_week |load_dts| +---++ | 12| 2014-02-07 15:11:35.0 | | 12| 2014-02-07 15:11:35.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2012-11-08 00:00:00.0 | | 12| 2015-01-13 00:00:00.0 | | 12| 2015-01-13 00:00:00.0 | | 12| null | | 12| null
Re: Strange error with hive null data
Try using is null instead of = 'null'. --Andries > On Mar 18, 2016, at 7:36 PM, Jason Altekrusewrote: > > I get that, but you are trying to compare your integers (curr_fiscal_week) > with the string 'null' in your case statement. Comparison between a string > and an int results in casting the string as an int. The string that is > failing to parse in your literal, not data appearing in the file. > > Jason Altekruse > Software Engineer at Dremio > Apache Drill Committer > >> On Fri, Mar 18, 2016 at 7:33 PM, Christopher Matta wrote: >> >> Jason, >> The first column is stored as an INT on disk, but there are no NULL values >> in it, the only NULL values are in the timestamp column (the last two >> records). >> >> Chris Matta >> cma...@mapr.com >> 215-701-3146 >> >> On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekruse >> wrote: >> >>> Does the first column have a type of int? In that case what might be >>> failing here is your string literal 'null' failing to be implicitly cast >> to >>> match the column. >>> >>> Jason Altekruse >>> Software Engineer at Dremio >>> Apache Drill Committer >>> >>> On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta >>> wrote: >>> I have some Hive data loaded with sqoop that looks like this: 0: jdbc:drill:> select curr_fiscal_week, load_dts from hive.dim.`YEAR_DATE_DIM`; +---++ | curr_fiscal_week |load_dts| +---++ | 12.0 | 2014-02-07 15:11:35.0 | | 12.0 | 2014-02-07 15:11:35.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2015-01-13 00:00:00.0 | | 12.0 | 2015-01-13 00:00:00.0 | | 12.0 | null | | 12.0 | null | +---++ 26 rows selected (0.229 seconds) When I’m trying to create a drill parquet table by using a CTAS with a >>> CASE statement to handle the nulls I get an error: 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`, . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`, . . . . . . . > CASE . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP) . . . . . . . > END as `FISCAL_YR_BEG_DT`, . . . . . . . > CASE . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE CAST(`FISCAL_YR_END_DT` as TIMESTAMP) . . . . . . . > END as `FISCAL_YR_END_DT`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_YR` as INT) . . . . . . . > END as `CURR_FISCAL_YR`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_HALF` as INT) . . . . . . . > END as `CURR_FISCAL_HALF`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_QTR` as INT) . . . . . . . > END as `CURR_FISCAL_QTR`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_MO` as INT) . . . . . . . > END as `CURR_FISCAL_MO`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_WEEK` as INT) . . . . . . . > END as `CURR_FISCAL_WEEK`, . . . . . . . > CASE . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE CAST(`LOAD_DTS` as TIMESTAMP) . . . . . . . > END as `LOAD_DTS` . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`; Error: SYSTEM ERROR: NumberFormatException: For input string: "null" Fragment 0:0 [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on
Re: Strange error with hive null data
Jason, The first column is stored as an INT on disk, but there are no NULL values in it, the only NULL values are in the timestamp column (the last two records). Chris Matta cma...@mapr.com 215-701-3146 On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekrusewrote: > Does the first column have a type of int? In that case what might be > failing here is your string literal 'null' failing to be implicitly cast to > match the column. > > Jason Altekruse > Software Engineer at Dremio > Apache Drill Committer > > On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta > wrote: > > > I have some Hive data loaded with sqoop that looks like this: > > > > 0: jdbc:drill:> select curr_fiscal_week, load_dts from > > hive.dim.`YEAR_DATE_DIM`; > > +---++ > > | curr_fiscal_week |load_dts| > > +---++ > > | 12.0 | 2014-02-07 15:11:35.0 | > > | 12.0 | 2014-02-07 15:11:35.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2015-01-13 00:00:00.0 | > > | 12.0 | 2015-01-13 00:00:00.0 | > > | 12.0 | null | > > | 12.0 | null | > > +---++ > > 26 rows selected (0.229 seconds) > > > > When I’m trying to create a drill parquet table by using a CTAS with a > CASE > > statement to handle the nulls I get an error: > > > > 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT > > . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`, > > . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE > > CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP) > > . . . . . . . > END as `FISCAL_YR_BEG_DT`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE > > CAST(`FISCAL_YR_END_DT` as TIMESTAMP) > > . . . . . . . > END as `FISCAL_YR_END_DT`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_YR` as INT) > > . . . . . . . > END as `CURR_FISCAL_YR`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_HALF` as INT) > > . . . . . . . > END as `CURR_FISCAL_HALF`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_QTR` as INT) > > . . . . . . . > END as `CURR_FISCAL_QTR`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_MO` as INT) > > . . . . . . . > END as `CURR_FISCAL_MO`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_WEEK` as INT) > > . . . . . . . > END as `CURR_FISCAL_WEEK`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE > > CAST(`LOAD_DTS` as TIMESTAMP) > > . . . . . . . > END as `LOAD_DTS` > > . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`; > > Error: SYSTEM ERROR: NumberFormatException: For input string: "null" > > > > Fragment 0:0 > > > > [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010] > > (state=,code=0) > > > > Simplified query: > > Using a CASE statement only on the field containing the null values > works: > > > > 0: jdbc:drill:> select cast(curr_fiscal_week as int) as > > curr_fiscal_week, case when load_dts = 'null' then null else > > cast(load_dts as timestamp) end as `load_dts` from > > hive.dim.`YEAR_DATE_DIM`; > > +---++ > > | curr_fiscal_week |load_dts| > > +---++ > > | 12| 2014-02-07 15:11:35.0 | > > | 12| 2014-02-07 15:11:35.0 | > > | 12| 2012-11-08 00:00:00.0 | > > | 12| 2012-11-08 00:00:00.0 | > > | 12