Strange error with hive null data

2016-03-19 Thread Christopher Matta
I have some Hive data loaded with sqoop that looks like this:

0: jdbc:drill:> select curr_fiscal_week, load_dts from hive.dim.`YEAR_DATE_DIM`;
+---++
| curr_fiscal_week  |load_dts|
+---++
| 12.0  | 2014-02-07 15:11:35.0  |
| 12.0  | 2014-02-07 15:11:35.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2012-11-08 00:00:00.0  |
| 12.0  | 2015-01-13 00:00:00.0  |
| 12.0  | 2015-01-13 00:00:00.0  |
| 12.0  | null   |
| 12.0  | null   |
+---++
26 rows selected (0.229 seconds)

When I’m trying to create a drill parquet table by using a CTAS with a CASE
statement to handle the nulls I get an error:

0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT
. . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`,
. . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`,
. . . . . . . > CASE
. . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE
CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP)
. . . . . . . > END as `FISCAL_YR_BEG_DT`,
. . . . . . . > CASE
. . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE
CAST(`FISCAL_YR_END_DT` as TIMESTAMP)
. . . . . . . > END as `FISCAL_YR_END_DT`,
. . . . . . . > CASE
. . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE
CAST(`CURR_FISCAL_YR` as INT)
. . . . . . . > END as `CURR_FISCAL_YR`,
. . . . . . . > CASE
. . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE
CAST(`CURR_FISCAL_HALF` as INT)
. . . . . . . > END as `CURR_FISCAL_HALF`,
. . . . . . . > CASE
. . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE
CAST(`CURR_FISCAL_QTR` as INT)
. . . . . . . > END as `CURR_FISCAL_QTR`,
. . . . . . . > CASE
. . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE
CAST(`CURR_FISCAL_MO` as INT)
. . . . . . . > END as `CURR_FISCAL_MO`,
. . . . . . . > CASE
. . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE
CAST(`CURR_FISCAL_WEEK` as INT)
. . . . . . . > END as `CURR_FISCAL_WEEK`,
. . . . . . . > CASE
. . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE
CAST(`LOAD_DTS` as TIMESTAMP)
. . . . . . . > END as `LOAD_DTS`
. . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`;
Error: SYSTEM ERROR: NumberFormatException: For input string: "null"

Fragment 0:0

[Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010]
(state=,code=0)

Simplified query:
Using a CASE statement only on the field containing the null values works:

0: jdbc:drill:> select cast(curr_fiscal_week as int) as
curr_fiscal_week, case when load_dts = 'null' then null else
cast(load_dts as timestamp) end as `load_dts` from
hive.dim.`YEAR_DATE_DIM`;
+---++
| curr_fiscal_week  |load_dts|
+---++
| 12| 2014-02-07 15:11:35.0  |
| 12| 2014-02-07 15:11:35.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2012-11-08 00:00:00.0  |
| 12| 2015-01-13 00:00:00.0  |
| 12| 2015-01-13 00:00:00.0  |
| 12| null   |
| 12| null 

Re: Strange error with hive null data

2016-03-18 Thread Andries Engelbrecht
Try using is null instead of = 'null'.

--Andries

> On Mar 18, 2016, at 7:36 PM, Jason Altekruse  wrote:
> 
> I get that, but you are trying to compare your integers (curr_fiscal_week)
> with the string 'null' in your case statement. Comparison between a string
> and an int results in casting the string as an int. The string that is
> failing to parse in your literal, not data appearing in the file.
> 
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
> 
>> On Fri, Mar 18, 2016 at 7:33 PM, Christopher Matta  wrote:
>> 
>> Jason,
>> The first column is stored as an INT on disk, but there are no NULL values
>> in it, the only NULL values are in the timestamp column (the last two
>> records).
>> 
>> Chris Matta
>> cma...@mapr.com
>> 215-701-3146
>> 
>> On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekruse 
>> wrote:
>> 
>>> Does the first column have a type of int? In that case what might be
>>> failing here is your string literal 'null' failing to be implicitly cast
>> to
>>> match the column.
>>> 
>>> Jason Altekruse
>>> Software Engineer at Dremio
>>> Apache Drill Committer
>>> 
>>> On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta 
>>> wrote:
>>> 
 I have some Hive data loaded with sqoop that looks like this:
 
 0: jdbc:drill:> select curr_fiscal_week, load_dts from
 hive.dim.`YEAR_DATE_DIM`;
 +---++
 | curr_fiscal_week  |load_dts|
 +---++
 | 12.0  | 2014-02-07 15:11:35.0  |
 | 12.0  | 2014-02-07 15:11:35.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2012-11-08 00:00:00.0  |
 | 12.0  | 2015-01-13 00:00:00.0  |
 | 12.0  | 2015-01-13 00:00:00.0  |
 | 12.0  | null   |
 | 12.0  | null   |
 +---++
 26 rows selected (0.229 seconds)
 
 When I’m trying to create a drill parquet table by using a CTAS with a
>>> CASE
 statement to handle the nulls I get an error:
 
 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT
 . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`,
 . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE
 CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP)
 . . . . . . . > END as `FISCAL_YR_BEG_DT`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE
 CAST(`FISCAL_YR_END_DT` as TIMESTAMP)
 . . . . . . . > END as `FISCAL_YR_END_DT`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE
 CAST(`CURR_FISCAL_YR` as INT)
 . . . . . . . > END as `CURR_FISCAL_YR`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE
 CAST(`CURR_FISCAL_HALF` as INT)
 . . . . . . . > END as `CURR_FISCAL_HALF`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE
 CAST(`CURR_FISCAL_QTR` as INT)
 . . . . . . . > END as `CURR_FISCAL_QTR`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE
 CAST(`CURR_FISCAL_MO` as INT)
 . . . . . . . > END as `CURR_FISCAL_MO`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE
 CAST(`CURR_FISCAL_WEEK` as INT)
 . . . . . . . > END as `CURR_FISCAL_WEEK`,
 . . . . . . . > CASE
 . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE
 CAST(`LOAD_DTS` as TIMESTAMP)
 . . . . . . . > END as `LOAD_DTS`
 . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`;
 Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
 
 Fragment 0:0
 
 [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on 

Re: Strange error with hive null data

2016-03-18 Thread Christopher Matta
Jason,
The first column is stored as an INT on disk, but there are no NULL values
in it, the only NULL values are in the timestamp column (the last two
records).

Chris Matta
cma...@mapr.com
215-701-3146

On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekruse  wrote:

> Does the first column have a type of int? In that case what might be
> failing here is your string literal 'null' failing to be implicitly cast to
> match the column.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta 
> wrote:
>
> > I have some Hive data loaded with sqoop that looks like this:
> >
> > 0: jdbc:drill:> select curr_fiscal_week, load_dts from
> > hive.dim.`YEAR_DATE_DIM`;
> > +---++
> > | curr_fiscal_week  |load_dts|
> > +---++
> > | 12.0  | 2014-02-07 15:11:35.0  |
> > | 12.0  | 2014-02-07 15:11:35.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2012-11-08 00:00:00.0  |
> > | 12.0  | 2015-01-13 00:00:00.0  |
> > | 12.0  | 2015-01-13 00:00:00.0  |
> > | 12.0  | null   |
> > | 12.0  | null   |
> > +---++
> > 26 rows selected (0.229 seconds)
> >
> > When I’m trying to create a drill parquet table by using a CTAS with a
> CASE
> > statement to handle the nulls I get an error:
> >
> > 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT
> > . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`,
> > . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE
> > CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP)
> > . . . . . . . > END as `FISCAL_YR_BEG_DT`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE
> > CAST(`FISCAL_YR_END_DT` as TIMESTAMP)
> > . . . . . . . > END as `FISCAL_YR_END_DT`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE
> > CAST(`CURR_FISCAL_YR` as INT)
> > . . . . . . . > END as `CURR_FISCAL_YR`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE
> > CAST(`CURR_FISCAL_HALF` as INT)
> > . . . . . . . > END as `CURR_FISCAL_HALF`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE
> > CAST(`CURR_FISCAL_QTR` as INT)
> > . . . . . . . > END as `CURR_FISCAL_QTR`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE
> > CAST(`CURR_FISCAL_MO` as INT)
> > . . . . . . . > END as `CURR_FISCAL_MO`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE
> > CAST(`CURR_FISCAL_WEEK` as INT)
> > . . . . . . . > END as `CURR_FISCAL_WEEK`,
> > . . . . . . . > CASE
> > . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE
> > CAST(`LOAD_DTS` as TIMESTAMP)
> > . . . . . . . > END as `LOAD_DTS`
> > . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`;
> > Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
> >
> > Fragment 0:0
> >
> > [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010]
> > (state=,code=0)
> >
> > Simplified query:
> > Using a CASE statement only on the field containing the null values
> works:
> >
> > 0: jdbc:drill:> select cast(curr_fiscal_week as int) as
> > curr_fiscal_week, case when load_dts = 'null' then null else
> > cast(load_dts as timestamp) end as `load_dts` from
> > hive.dim.`YEAR_DATE_DIM`;
> > +---++
> > | curr_fiscal_week  |load_dts|
> > +---++
> > | 12| 2014-02-07 15:11:35.0  |
> > | 12| 2014-02-07 15:11:35.0  |
> > | 12| 2012-11-08 00:00:00.0  |
> > | 12| 2012-11-08 00:00:00.0  |
> > | 12