Re: NumberFormatException with cast to double?

2016-03-13 Thread Hsuan Yi Chu
If you like to treat "" as NULL, please set this option
"drill.exec.functions.cast_empty_string_to_null" as true in the system
level.

Meaning,
alter system set `drill.exec.functions.cast_empty_string_to_null` = true

On Sun, Mar 13, 2016 at 3:53 PM, Matt  wrote:

> That does indeed remove the NumberFormatException, replacing it with the
> same error on "empty String".
>
> As indicated by Jason in this thread I may have mistakenly believed that
> empty columns would be treated as NULLs.
>
> Is there a setting to set the NULL value in the data, or should all of
> those COALESCE be replaced with CASE statements? It would be a lot of CASEs
> in this case (39 of them).
>
>
> On 11 Mar 2016, at 0:31, Abdel Hakim Deneche wrote:
>
> Looks like the COALESCE function is the source of the problem. Passing a
>> double (0.0) instead of an int (0) as a second expression solved the
>> problem for me:
>>
>> CAST(COALESCE(t_total, 0.0) AS double)
>>
>>
>> On Fri, Mar 11, 2016 at 12:45 AM, Matt  wrote:
>>
>> ~~~
>>> 00-01  Project(date_tm=[CAST($23):TIMESTAMP(0)],
>>> id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
>>> "ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET
>>> "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
>>> id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
>>> "ISO-8859-1$en_US$primary"], b_total=[CAST(CASE(IS NOT NULL($4), $4,
>>> 0)):BIGINT], t_total=[CAST(CASE(IS NOT NULL($31), $31, 0)):DOUBLE],
>>> h_total=[CAST(CASE(IS NOT NULL($40), $40, 0)):BIGINT],
>>> b_small=[CAST(CASE(IS NOT NULL($36), $36, 0)):BIGINT],
>>> t_small=[CAST(CASE(IS NOT NULL($14), $14, 0)):DOUBLE],
>>> h_small=[CAST(CASE(IS NOT NULL($38), $38, 0)):BIGINT],
>>> b_18000=[CAST(CASE(IS NOT NULL($32), $32, 0)):BIGINT],
>>> t_18000=[CAST(CASE(IS NOT NULL($24), $24, 0)):DOUBLE],
>>> h_18000=[CAST(CASE(IS NOT NULL($27), $27, 0)):BIGINT],
>>> b_12000=[CAST(CASE(IS NOT NULL($30), $30, 0)):BIGINT],
>>> t_12000=[CAST(CASE(IS NOT NULL($28), $28, 0)):DOUBLE],
>>> h_12000=[CAST(CASE(IS NOT NULL($20), $20, 0)):BIGINT],
>>> b_6000=[CAST(CASE(IS
>>> NOT NULL($41), $41, 0)):BIGINT], t_6000=[CAST(CASE(IS NOT NULL($37), $37,
>>> 0)):DOUBLE], h_6000=[CAST(CASE(IS NOT NULL($29), $29, 0)):BIGINT],
>>> b_3000=[CAST(CASE(IS NOT NULL($17), $17, 0)):BIGINT],
>>> t_3000=[CAST(CASE(IS
>>> NOT NULL($7), $7, 0)):DOUBLE], h_3000=[CAST(CASE(IS NOT NULL($1), $1,
>>> 0)):BIGINT], b_2000=[CAST(CASE(IS NOT NULL($26), $26, 0)):BIGINT],
>>> t_2000=[CAST(CASE(IS NOT NULL($34), $34, 0)):DOUBLE],
>>> h_2000=[CAST(CASE(IS
>>> NOT NULL($10), $10, 0)):BIGINT], b_1500=[CAST(CASE(IS NOT NULL($42), $42,
>>> 0)):BIGINT], t_1500=[CAST(CASE(IS NOT NULL($13), $13, 0)):DOUBLE],
>>> h_1500=[CAST(CASE(IS NOT NULL($3), $3, 0)):BIGINT], b_1250=[CAST(CASE(IS
>>> NOT NULL($21), $21, 0)):BIGINT], t_1250=[CAST(CASE(IS NOT NULL($25), $25,
>>> 0)):DOUBLE], h_1250=[CAST(CASE(IS NOT NULL($16), $16, 0)):BIGINT],
>>> b_1000=[CAST(CASE(IS NOT NULL($12), $12, 0)):BIGINT],
>>> t_1000=[CAST(CASE(IS
>>> NOT NULL($19), $19, 0)):DOUBLE], h_1000=[CAST(CASE(IS NOT NULL($6), $6,
>>> 0)):BIGINT], b_750=[CAST(CASE(IS NOT NULL($9), $9, 0)):BIGINT],
>>> t_750=[CAST(CASE(IS NOT NULL($0), $0, 0)):DOUBLE], h_750=[CAST(CASE(IS
>>> NOT
>>> NULL($5), $5, 0)):BIGINT], b_500=[CAST(CASE(IS NOT NULL($2), $2,
>>> 0)):BIGINT], t_500=[CAST(CASE(IS NOT NULL($8), $8, 0)):DOUBLE],
>>> h_500=[CAST(CASE(IS NOT NULL($39), $39, 0)):BIGINT], b_0=[CAST(CASE(IS
>>> NOT
>>> NULL($18), $18, 0)):BIGINT], t_0=[CAST(CASE(IS NOT NULL($35), $35,
>>> 0)):DOUBLE], EXPR$42=[CAST(CASE(IS NOT NULL($22), $22, 0)):BIGINT])
>>> 00-02Scan(groupscan=[EasyGroupScan
>>> [selectionRoot=hdfs://es05:54310/caspr/csv/smalltest.csv, numFiles=1,
>>> columns=[`date_tm`, `id_1`, `id_2`, `id_3`, `b_total`, `t_total`,
>>> `h_total`, `b_small`, `t_small`, `h_small`, `b_18000`, `t_18000`,
>>> `h_18000`, `b_12000`, `t_12000`, `h_12000`, `b_6000`, `t_6000`, `h_6000`,
>>> `b_3000`, `t_3000`, `h_3000`, `b_2000`, `t_2000`, `h_2000`, `b_1500`,
>>> `t_1500`, `h_1500`, `b_1250`, `t_1250`, `h_1250`, `b_1000`, `t_1000`,
>>> `h_1000`, `b_750`, `t_750`, `h_750`, `b_500`, `t_500`, `h_500`, `b_0`,
>>> `t_0`, `h_0`], files=[hdfs://es05:54310/caspr/csv/smalltest.csv]]])
>>> ~~~
>>>
>>> ~~~
>>> {
>>>   "head" : {
>>> "version" : 1,
>>> "generator" : {
>>>   "type" : "ExplainHandler",
>>>   "info" : ""
>>> },
>>> "type" : "APACHE_DRILL_PHYSICAL",
>>> "options" : [ {
>>>   "kind" : "STRING",
>>>   "type" : "SESSION",
>>>   "name" : "store.format",
>>>   "string_val" : "parquet"
>>> }, {
>>>   "kind" : "BOOLEAN",
>>>   "type" : "SESSION",
>>>   "name" : "exec.errors.verbose",
>>>   "bool_val" : true
>>> } ],
>>> "queue" : 0,
>>> "resultMode" : "EXEC"
>>>   },
>>>   "graph" : [ {
>>> "pop" : "fs-scan",
>>> "@id" : 2,
>>> "userName" : "hduser",
>>> "files" : [ "hdfs://es05:54310/caspr/csv/smalltest.csv" ],
>>> "storage" : {
>>>   "t

Re: NumberFormatException with cast to double?

2016-03-13 Thread Matt
That does indeed remove the NumberFormatException, replacing it with the 
same error on "empty String".


As indicated by Jason in this thread I may have mistakenly believed that 
empty columns would be treated as NULLs.


Is there a setting to set the NULL value in the data, or should all of 
those COALESCE be replaced with CASE statements? It would be a lot of 
CASEs in this case (39 of them).



On 11 Mar 2016, at 0:31, Abdel Hakim Deneche wrote:

Looks like the COALESCE function is the source of the problem. Passing 
a

double (0.0) instead of an int (0) as a second expression solved the
problem for me:

CAST(COALESCE(t_total, 0.0) AS double)


On Fri, Mar 11, 2016 at 12:45 AM, Matt  wrote:


~~~
00-01  Project(date_tm=[CAST($23):TIMESTAMP(0)],
id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
"ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET
"ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
"ISO-8859-1$en_US$primary"], b_total=[CAST(CASE(IS NOT NULL($4), $4,
0)):BIGINT], t_total=[CAST(CASE(IS NOT NULL($31), $31, 0)):DOUBLE],
h_total=[CAST(CASE(IS NOT NULL($40), $40, 0)):BIGINT],
b_small=[CAST(CASE(IS NOT NULL($36), $36, 0)):BIGINT],
t_small=[CAST(CASE(IS NOT NULL($14), $14, 0)):DOUBLE],
h_small=[CAST(CASE(IS NOT NULL($38), $38, 0)):BIGINT],
b_18000=[CAST(CASE(IS NOT NULL($32), $32, 0)):BIGINT],
t_18000=[CAST(CASE(IS NOT NULL($24), $24, 0)):DOUBLE],
h_18000=[CAST(CASE(IS NOT NULL($27), $27, 0)):BIGINT],
b_12000=[CAST(CASE(IS NOT NULL($30), $30, 0)):BIGINT],
t_12000=[CAST(CASE(IS NOT NULL($28), $28, 0)):DOUBLE],
h_12000=[CAST(CASE(IS NOT NULL($20), $20, 0)):BIGINT], 
b_6000=[CAST(CASE(IS
NOT NULL($41), $41, 0)):BIGINT], t_6000=[CAST(CASE(IS NOT NULL($37), 
$37,

0)):DOUBLE], h_6000=[CAST(CASE(IS NOT NULL($29), $29, 0)):BIGINT],
b_3000=[CAST(CASE(IS NOT NULL($17), $17, 0)):BIGINT], 
t_3000=[CAST(CASE(IS

NOT NULL($7), $7, 0)):DOUBLE], h_3000=[CAST(CASE(IS NOT NULL($1), $1,
0)):BIGINT], b_2000=[CAST(CASE(IS NOT NULL($26), $26, 0)):BIGINT],
t_2000=[CAST(CASE(IS NOT NULL($34), $34, 0)):DOUBLE], 
h_2000=[CAST(CASE(IS
NOT NULL($10), $10, 0)):BIGINT], b_1500=[CAST(CASE(IS NOT NULL($42), 
$42,

0)):BIGINT], t_1500=[CAST(CASE(IS NOT NULL($13), $13, 0)):DOUBLE],
h_1500=[CAST(CASE(IS NOT NULL($3), $3, 0)):BIGINT], 
b_1250=[CAST(CASE(IS
NOT NULL($21), $21, 0)):BIGINT], t_1250=[CAST(CASE(IS NOT NULL($25), 
$25,

0)):DOUBLE], h_1250=[CAST(CASE(IS NOT NULL($16), $16, 0)):BIGINT],
b_1000=[CAST(CASE(IS NOT NULL($12), $12, 0)):BIGINT], 
t_1000=[CAST(CASE(IS
NOT NULL($19), $19, 0)):DOUBLE], h_1000=[CAST(CASE(IS NOT NULL($6), 
$6,

0)):BIGINT], b_750=[CAST(CASE(IS NOT NULL($9), $9, 0)):BIGINT],
t_750=[CAST(CASE(IS NOT NULL($0), $0, 0)):DOUBLE], 
h_750=[CAST(CASE(IS NOT

NULL($5), $5, 0)):BIGINT], b_500=[CAST(CASE(IS NOT NULL($2), $2,
0)):BIGINT], t_500=[CAST(CASE(IS NOT NULL($8), $8, 0)):DOUBLE],
h_500=[CAST(CASE(IS NOT NULL($39), $39, 0)):BIGINT], 
b_0=[CAST(CASE(IS NOT

NULL($18), $18, 0)):BIGINT], t_0=[CAST(CASE(IS NOT NULL($35), $35,
0)):DOUBLE], EXPR$42=[CAST(CASE(IS NOT NULL($22), $22, 0)):BIGINT])
00-02Scan(groupscan=[EasyGroupScan
[selectionRoot=hdfs://es05:54310/caspr/csv/smalltest.csv, numFiles=1,
columns=[`date_tm`, `id_1`, `id_2`, `id_3`, `b_total`, `t_total`,
`h_total`, `b_small`, `t_small`, `h_small`, `b_18000`, `t_18000`,
`h_18000`, `b_12000`, `t_12000`, `h_12000`, `b_6000`, `t_6000`, 
`h_6000`,

`b_3000`, `t_3000`, `h_3000`, `b_2000`, `t_2000`, `h_2000`, `b_1500`,
`t_1500`, `h_1500`, `b_1250`, `t_1250`, `h_1250`, `b_1000`, `t_1000`,
`h_1000`, `b_750`, `t_750`, `h_750`, `b_500`, `t_500`, `h_500`, 
`b_0`,

`t_0`, `h_0`], files=[hdfs://es05:54310/caspr/csv/smalltest.csv]]])
~~~

~~~
{
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "type" : "SESSION",
  "name" : "store.format",
  "string_val" : "parquet"
}, {
  "kind" : "BOOLEAN",
  "type" : "SESSION",
  "name" : "exec.errors.verbose",
  "bool_val" : true
} ],
"queue" : 0,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "fs-scan",
"@id" : 2,
"userName" : "hduser",
"files" : [ "hdfs://es05:54310/caspr/csv/smalltest.csv" ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "hdfs://es05:54310",
  "workspaces" : {
"root" : {
  "location" : "/",
  "writable" : true,
  "defaultInputFormat" : null
},
"tmp" : {
  "location" : "/tmp/",
  "writable" : true,
  "defaultInputFormat" : null
},
"caspr" : {
  "location" : "/caspr",
  "writable" : true,
  "defaultInputFormat" : "csv"
}
  "ref" : "`t_3000`",
  "expr" : "cast( ( ( if (isnotnull(`t_3000`)  ) then (`t_3000` )
else (0 )  end  )  ) as FLOAT8

Re: NumberFormatException with cast to double?

2016-03-10 Thread Abdel Hakim Deneche
Looks like the COALESCE function is the source of the problem. Passing a
double (0.0) instead of an int (0) as a second expression solved the
problem for me:

CAST(COALESCE(t_total, 0.0) AS double)


On Fri, Mar 11, 2016 at 12:45 AM, Matt  wrote:

> ~~~
> 00-01  Project(date_tm=[CAST($23):TIMESTAMP(0)],
> id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
> "ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET
> "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
> id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
> "ISO-8859-1$en_US$primary"], b_total=[CAST(CASE(IS NOT NULL($4), $4,
> 0)):BIGINT], t_total=[CAST(CASE(IS NOT NULL($31), $31, 0)):DOUBLE],
> h_total=[CAST(CASE(IS NOT NULL($40), $40, 0)):BIGINT],
> b_small=[CAST(CASE(IS NOT NULL($36), $36, 0)):BIGINT],
> t_small=[CAST(CASE(IS NOT NULL($14), $14, 0)):DOUBLE],
> h_small=[CAST(CASE(IS NOT NULL($38), $38, 0)):BIGINT],
> b_18000=[CAST(CASE(IS NOT NULL($32), $32, 0)):BIGINT],
> t_18000=[CAST(CASE(IS NOT NULL($24), $24, 0)):DOUBLE],
> h_18000=[CAST(CASE(IS NOT NULL($27), $27, 0)):BIGINT],
> b_12000=[CAST(CASE(IS NOT NULL($30), $30, 0)):BIGINT],
> t_12000=[CAST(CASE(IS NOT NULL($28), $28, 0)):DOUBLE],
> h_12000=[CAST(CASE(IS NOT NULL($20), $20, 0)):BIGINT], b_6000=[CAST(CASE(IS
> NOT NULL($41), $41, 0)):BIGINT], t_6000=[CAST(CASE(IS NOT NULL($37), $37,
> 0)):DOUBLE], h_6000=[CAST(CASE(IS NOT NULL($29), $29, 0)):BIGINT],
> b_3000=[CAST(CASE(IS NOT NULL($17), $17, 0)):BIGINT], t_3000=[CAST(CASE(IS
> NOT NULL($7), $7, 0)):DOUBLE], h_3000=[CAST(CASE(IS NOT NULL($1), $1,
> 0)):BIGINT], b_2000=[CAST(CASE(IS NOT NULL($26), $26, 0)):BIGINT],
> t_2000=[CAST(CASE(IS NOT NULL($34), $34, 0)):DOUBLE], h_2000=[CAST(CASE(IS
> NOT NULL($10), $10, 0)):BIGINT], b_1500=[CAST(CASE(IS NOT NULL($42), $42,
> 0)):BIGINT], t_1500=[CAST(CASE(IS NOT NULL($13), $13, 0)):DOUBLE],
> h_1500=[CAST(CASE(IS NOT NULL($3), $3, 0)):BIGINT], b_1250=[CAST(CASE(IS
> NOT NULL($21), $21, 0)):BIGINT], t_1250=[CAST(CASE(IS NOT NULL($25), $25,
> 0)):DOUBLE], h_1250=[CAST(CASE(IS NOT NULL($16), $16, 0)):BIGINT],
> b_1000=[CAST(CASE(IS NOT NULL($12), $12, 0)):BIGINT], t_1000=[CAST(CASE(IS
> NOT NULL($19), $19, 0)):DOUBLE], h_1000=[CAST(CASE(IS NOT NULL($6), $6,
> 0)):BIGINT], b_750=[CAST(CASE(IS NOT NULL($9), $9, 0)):BIGINT],
> t_750=[CAST(CASE(IS NOT NULL($0), $0, 0)):DOUBLE], h_750=[CAST(CASE(IS NOT
> NULL($5), $5, 0)):BIGINT], b_500=[CAST(CASE(IS NOT NULL($2), $2,
> 0)):BIGINT], t_500=[CAST(CASE(IS NOT NULL($8), $8, 0)):DOUBLE],
> h_500=[CAST(CASE(IS NOT NULL($39), $39, 0)):BIGINT], b_0=[CAST(CASE(IS NOT
> NULL($18), $18, 0)):BIGINT], t_0=[CAST(CASE(IS NOT NULL($35), $35,
> 0)):DOUBLE], EXPR$42=[CAST(CASE(IS NOT NULL($22), $22, 0)):BIGINT])
> 00-02Scan(groupscan=[EasyGroupScan
> [selectionRoot=hdfs://es05:54310/caspr/csv/smalltest.csv, numFiles=1,
> columns=[`date_tm`, `id_1`, `id_2`, `id_3`, `b_total`, `t_total`,
> `h_total`, `b_small`, `t_small`, `h_small`, `b_18000`, `t_18000`,
> `h_18000`, `b_12000`, `t_12000`, `h_12000`, `b_6000`, `t_6000`, `h_6000`,
> `b_3000`, `t_3000`, `h_3000`, `b_2000`, `t_2000`, `h_2000`, `b_1500`,
> `t_1500`, `h_1500`, `b_1250`, `t_1250`, `h_1250`, `b_1000`, `t_1000`,
> `h_1000`, `b_750`, `t_750`, `h_750`, `b_500`, `t_500`, `h_500`, `b_0`,
> `t_0`, `h_0`], files=[hdfs://es05:54310/caspr/csv/smalltest.csv]]])
> ~~~
>
> ~~~
> {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ {
>   "kind" : "STRING",
>   "type" : "SESSION",
>   "name" : "store.format",
>   "string_val" : "parquet"
> }, {
>   "kind" : "BOOLEAN",
>   "type" : "SESSION",
>   "name" : "exec.errors.verbose",
>   "bool_val" : true
> } ],
> "queue" : 0,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "fs-scan",
> "@id" : 2,
> "userName" : "hduser",
> "files" : [ "hdfs://es05:54310/caspr/csv/smalltest.csv" ],
> "storage" : {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "hdfs://es05:54310",
>   "workspaces" : {
> "root" : {
>   "location" : "/",
>   "writable" : true,
>   "defaultInputFormat" : null
> },
> "tmp" : {
>   "location" : "/tmp/",
>   "writable" : true,
>   "defaultInputFormat" : null
> },
> "caspr" : {
>   "location" : "/caspr",
>   "writable" : true,
>   "defaultInputFormat" : "csv"
> }
>   "ref" : "`t_3000`",
>   "expr" : "cast( ( ( if (isnotnull(`t_3000`)  ) then (`t_3000` )
> else (0 )  end  )  ) as FLOAT8 )"
> }, {
>   "ref" : "`h_3000`",
>   "expr" : "cast( ( ( if (isnotnull(`h_3000`)  ) then (`h_3000` )
> else (0 )  end  )  ) as BIGINT )"
> }, {
>   "ref" : "`b_2000`",
>   "expr" : "cast( ( ( if (isnotnull(`b_2000`)  ) then (`b_2000` )
> else (0 )  end  )

Re: NumberFormatException with cast to double?

2016-03-10 Thread Matt

~~~
00-01  Project(date_tm=[CAST($23):TIMESTAMP(0)], 
id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET 
"ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], 
id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary"], b_total=[CAST(CASE(IS NOT NULL($4), $4, 
0)):BIGINT], t_total=[CAST(CASE(IS NOT NULL($31), $31, 0)):DOUBLE], 
h_total=[CAST(CASE(IS NOT NULL($40), $40, 0)):BIGINT], 
b_small=[CAST(CASE(IS NOT NULL($36), $36, 0)):BIGINT], 
t_small=[CAST(CASE(IS NOT NULL($14), $14, 0)):DOUBLE], 
h_small=[CAST(CASE(IS NOT NULL($38), $38, 0)):BIGINT], 
b_18000=[CAST(CASE(IS NOT NULL($32), $32, 0)):BIGINT], 
t_18000=[CAST(CASE(IS NOT NULL($24), $24, 0)):DOUBLE], 
h_18000=[CAST(CASE(IS NOT NULL($27), $27, 0)):BIGINT], 
b_12000=[CAST(CASE(IS NOT NULL($30), $30, 0)):BIGINT], 
t_12000=[CAST(CASE(IS NOT NULL($28), $28, 0)):DOUBLE], 
h_12000=[CAST(CASE(IS NOT NULL($20), $20, 0)):BIGINT], 
b_6000=[CAST(CASE(IS NOT NULL($41), $41, 0)):BIGINT], 
t_6000=[CAST(CASE(IS NOT NULL($37), $37, 0)):DOUBLE], 
h_6000=[CAST(CASE(IS NOT NULL($29), $29, 0)):BIGINT], 
b_3000=[CAST(CASE(IS NOT NULL($17), $17, 0)):BIGINT], 
t_3000=[CAST(CASE(IS NOT NULL($7), $7, 0)):DOUBLE], h_3000=[CAST(CASE(IS 
NOT NULL($1), $1, 0)):BIGINT], b_2000=[CAST(CASE(IS NOT NULL($26), $26, 
0)):BIGINT], t_2000=[CAST(CASE(IS NOT NULL($34), $34, 0)):DOUBLE], 
h_2000=[CAST(CASE(IS NOT NULL($10), $10, 0)):BIGINT], 
b_1500=[CAST(CASE(IS NOT NULL($42), $42, 0)):BIGINT], 
t_1500=[CAST(CASE(IS NOT NULL($13), $13, 0)):DOUBLE], 
h_1500=[CAST(CASE(IS NOT NULL($3), $3, 0)):BIGINT], b_1250=[CAST(CASE(IS 
NOT NULL($21), $21, 0)):BIGINT], t_1250=[CAST(CASE(IS NOT NULL($25), 
$25, 0)):DOUBLE], h_1250=[CAST(CASE(IS NOT NULL($16), $16, 0)):BIGINT], 
b_1000=[CAST(CASE(IS NOT NULL($12), $12, 0)):BIGINT], 
t_1000=[CAST(CASE(IS NOT NULL($19), $19, 0)):DOUBLE], 
h_1000=[CAST(CASE(IS NOT NULL($6), $6, 0)):BIGINT], b_750=[CAST(CASE(IS 
NOT NULL($9), $9, 0)):BIGINT], t_750=[CAST(CASE(IS NOT NULL($0), $0, 
0)):DOUBLE], h_750=[CAST(CASE(IS NOT NULL($5), $5, 0)):BIGINT], 
b_500=[CAST(CASE(IS NOT NULL($2), $2, 0)):BIGINT], t_500=[CAST(CASE(IS 
NOT NULL($8), $8, 0)):DOUBLE], h_500=[CAST(CASE(IS NOT NULL($39), $39, 
0)):BIGINT], b_0=[CAST(CASE(IS NOT NULL($18), $18, 0)):BIGINT], 
t_0=[CAST(CASE(IS NOT NULL($35), $35, 0)):DOUBLE], EXPR$42=[CAST(CASE(IS 
NOT NULL($22), $22, 0)):BIGINT])
00-02Scan(groupscan=[EasyGroupScan 
[selectionRoot=hdfs://es05:54310/caspr/csv/smalltest.csv, numFiles=1, 
columns=[`date_tm`, `id_1`, `id_2`, `id_3`, `b_total`, `t_total`, 
`h_total`, `b_small`, `t_small`, `h_small`, `b_18000`, `t_18000`, 
`h_18000`, `b_12000`, `t_12000`, `h_12000`, `b_6000`, `t_6000`, 
`h_6000`, `b_3000`, `t_3000`, `h_3000`, `b_2000`, `t_2000`, `h_2000`, 
`b_1500`, `t_1500`, `h_1500`, `b_1250`, `t_1250`, `h_1250`, `b_1000`, 
`t_1000`, `h_1000`, `b_750`, `t_750`, `h_750`, `b_500`, `t_500`, 
`h_500`, `b_0`, `t_0`, `h_0`], 
files=[hdfs://es05:54310/caspr/csv/smalltest.csv]]])

~~~

~~~
{
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "type" : "SESSION",
  "name" : "store.format",
  "string_val" : "parquet"
}, {
  "kind" : "BOOLEAN",
  "type" : "SESSION",
  "name" : "exec.errors.verbose",
  "bool_val" : true
} ],
"queue" : 0,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "fs-scan",
"@id" : 2,
"userName" : "hduser",
"files" : [ "hdfs://es05:54310/caspr/csv/smalltest.csv" ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "hdfs://es05:54310",
  "workspaces" : {
"root" : {
  "location" : "/",
  "writable" : true,
  "defaultInputFormat" : null
},
"tmp" : {
  "location" : "/tmp/",
  "writable" : true,
  "defaultInputFormat" : null
},
"caspr" : {
  "location" : "/caspr",
  "writable" : true,
  "defaultInputFormat" : "csv"
}
  "ref" : "`t_3000`",
  "expr" : "cast( ( ( if (isnotnull(`t_3000`)  ) then (`t_3000` )  
else (0 )  end  )  ) as FLOAT8 )"

}, {
  "ref" : "`h_3000`",
  "expr" : "cast( ( ( if (isnotnull(`h_3000`)  ) then (`h_3000` )  
else (0 )  end  )  ) as BIGINT )"

}, {
  "ref" : "`b_2000`",
  "expr" : "cast( ( ( if (isnotnull(`b_2000`)  ) then (`b_2000` )  
else (0 )  end  )  ) as BIGINT )"

}, {
  "ref" : "`t_2000`",
  "expr" : "cast( ( ( if (isnotnull(`t_2000`)  ) then (`t_2000` )  
else (0 )  end  )  ) as FLOAT8 )"

}, {
  "ref" : "`h_2000`",
  "expr" : "cast( ( ( if (isnotnull(`h_2000`)  ) then (`h_2000` )  
else (0 )  end  )  ) as BIGINT )"

}, {
  "ref" : "`b_1500`",
  "expr" : "cast( ( ( if (isnotnull(`b_1500`)  ) then (`b_1500` 

Re: NumberFormatException with cast to double?

2016-03-10 Thread Hsuan Yi Chu
>From the log, I see this one:

StringFunctionHelpers.varCharToInt():

It seems like there might be an implicit casting (to integer) be inserted
into your expression.

Can you do EXPLAIN PLAN FOR  "your query" (not double quoted needed)

and share what you see?

On Thu, Mar 10, 2016 at 3:11 PM, Matt  wrote:

> TRIM() did not help, and I visually verified there is no whitespace around
> the number: The entire test data file:
>
> ~~~
>
> date_tm,id_1,id_2,id_3,b_total,t_total,h_total,b_small,t_small,h_small,b_18000,t_18000,h_18000,b_12000,t_12000,h_12000,b_6000,t_6000,h_6000,b_3000,t_3000,h_3000,b_2000,t_2000,h_2000,b_1500,t_1500,h_1500,b_1250,t_1250,h_1250,b_1000,t_1000,h_1000,b_750,t_750,h_750,b_500,t_500,h_500,b_0,t_0,h_0
> 2015-10-17
> 00:00:00,f5e9v8u2,err,mi1,268918254,140.755,23519,268918254,140.755,23519,
> ~~~
>
> Including a trim resulted in the same:
>
> ~~~
> Error: SYSTEM ERROR: NumberFormatException: 140.755
>
> Fragment 0:0
>
> [Error Id: 01d368ee-6c06-476c-a553-5eb5baea7c3f on es07:31010]
>
>   (java.lang.NumberFormatException) 140.755
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
>
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToInt():120
> org.apache.drill.exec.test.generated.ProjectorGen122.doEval():467
>
> org.apache.drill.exec.test.generated.ProjectorGen122.projectRecords():62
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():175
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)
> ~~~
>
> On 10 Mar 2016, at 17:49, Jason Altekruse wrote:
>
> Could you try throwing a trim() in before the cast? Might just be
>> whitespace.
>>
>> There shouldn't be a need to add the COALESCE statement, as we only ever
>> read defined values (which may be empty string) out of a CSV file. You
>> should instead write out a full case statement that checks for empty
>> string
>> and provides your default value of 0 in that case.
>>
>> - Jason
>>
>> Jason Altekruse
>> Software Engineer at Dremio
>> Apache Drill Committer
>>
>> On Thu, Mar 10, 2016 at 2:32 PM, Matt  wrote:
>>
>> Have some CSV data that Drill 1.5 selects as-is without any problems,
>>> until I attempt to CAST columns in a CTAS or plain SELECT:
>>>
>>> Error: SYSTEM ERROR: NumberFormatException: 140.755
>>>
>>> The data is unquoted CSV, and column in question does have the value
>>> "140.755" (unquoted). As the column can be empty, I am using the
>>> following
>>> transform:
>>>
>>>   CAST(COALESCE(t_total, 0) AS double) AS t_total
>>>
>>> And on the first data row:
>>>
>>> Fragment 1:0
>>>
>>> [Error Id: 4b4d83cf-a87d-451e-9cfa-4280e5adf64f on es08:31010]
>>>
>>>   (java.lang.NumberFormatException) 140.755
>>>
>>> What could be causing this error?
>>>
>>>
>>>


Re: NumberFormatException with cast to double?

2016-03-10 Thread Matt
TRIM() did not help, and I visually verified there is no whitespace 
around the number: The entire test data file:


~~~
date_tm,id_1,id_2,id_3,b_total,t_total,h_total,b_small,t_small,h_small,b_18000,t_18000,h_18000,b_12000,t_12000,h_12000,b_6000,t_6000,h_6000,b_3000,t_3000,h_3000,b_2000,t_2000,h_2000,b_1500,t_1500,h_1500,b_1250,t_1250,h_1250,b_1000,t_1000,h_1000,b_750,t_750,h_750,b_500,t_500,h_500,b_0,t_0,h_0
2015-10-17 
00:00:00,f5e9v8u2,err,mi1,268918254,140.755,23519,268918254,140.755,23519,

~~~

Including a trim resulted in the same:

~~~
Error: SYSTEM ERROR: NumberFormatException: 140.755

Fragment 0:0

[Error Id: 01d368ee-6c06-476c-a553-5eb5baea7c3f on es07:31010]

  (java.lang.NumberFormatException) 140.755
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95

org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToInt():120

org.apache.drill.exec.test.generated.ProjectorGen122.doEval():467

org.apache.drill.exec.test.generated.ProjectorGen122.projectRecords():62

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():175

org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132

org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():104

org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81

org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():415
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745 (state=,code=0)
~~~

On 10 Mar 2016, at 17:49, Jason Altekruse wrote:


Could you try throwing a trim() in before the cast? Might just be
whitespace.

There shouldn't be a need to add the COALESCE statement, as we only 
ever

read defined values (which may be empty string) out of a CSV file. You
should instead write out a full case statement that checks for empty 
string

and provides your default value of 0 in that case.

- Jason

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Mar 10, 2016 at 2:32 PM, Matt  wrote:


Have some CSV data that Drill 1.5 selects as-is without any problems,
until I attempt to CAST columns in a CTAS or plain SELECT:

Error: SYSTEM ERROR: NumberFormatException: 140.755

The data is unquoted CSV, and column in question does have the value
"140.755" (unquoted). As the column can be empty, I am using the 
following

transform:

  CAST(COALESCE(t_total, 0) AS double) AS t_total

And on the first data row:

Fragment 1:0

[Error Id: 4b4d83cf-a87d-451e-9cfa-4280e5adf64f on es08:31010]

  (java.lang.NumberFormatException) 140.755

What could be causing this error?




Re: NumberFormatException with cast to double?

2016-03-10 Thread Jason Altekruse
Could you try throwing a trim() in before the cast? Might just be
whitespace.

There shouldn't be a need to add the COALESCE statement, as we only ever
read defined values (which may be empty string) out of a CSV file. You
should instead write out a full case statement that checks for empty string
and provides your default value of 0 in that case.

- Jason

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Mar 10, 2016 at 2:32 PM, Matt  wrote:

> Have some CSV data that Drill 1.5 selects as-is without any problems,
> until I attempt to CAST columns in a CTAS or plain SELECT:
>
> Error: SYSTEM ERROR: NumberFormatException: 140.755
>
> The data is unquoted CSV, and column in question does have the value
> "140.755" (unquoted). As the column can be empty, I am using the following
> transform:
>
>   CAST(COALESCE(t_total, 0) AS double) AS t_total
>
> And on the first data row:
>
> Fragment 1:0
>
> [Error Id: 4b4d83cf-a87d-451e-9cfa-4280e5adf64f on es08:31010]
>
>   (java.lang.NumberFormatException) 140.755
>
> What could be causing this error?
>
>