Re: Display column data type without code

2018-04-29 Thread Paul Rogers
Turns out I really needed better type functions in order to explain the nuances 
of Drill types, so I went ahead and created them.

See DRILL-6361, PR #1242 [1]. Examples shown in the PR. Reviewers very much 
appreciated.

Thanks,
- Paul

[1] https://github.com/apache/drill/pull/1242

 

On Saturday, April 28, 2018, 5:58:47 PM PDT, Charles Givre 
 wrote:  
 
 I’d like to weigh in here, but this would be EXTREMELY useful.  When I was 
trying to write connectors to enable various BI tools to connect to Drill, such 
as SQLPad and Metabase, the inability to get information about how drill 
interprets the data was really difficult to get around.  Just me .02. 

> On Apr 28, 2018, at 18:05, Paul Rogers  wrote:
> 
> Hi Rob,
> 
> Thanks for the suggestion. While this works for Hive (as you showed), it does 
> not work for CSV files:
> 
> DESCRIBE `csvh/cust.csvh`;
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> +--++--+
> 
> The typeof() function is handy, but does not report the "is nullable" (or 
> repeated) "mode" of a column, and it loses the data type if a value is null. 
> The following CSV file (with headers) uses non-nullable VARCHAR columns:
> 
> SELECT typeof(custId) FROM `csvh/cust.csvh`;
> +--+
> |  EXPR$0  |
> +--+
> | VARCHAR  |
> +--+
> 
> Now, do something similar with JSON which uses a (nullable) VARCHAR:
> 
> SELECT typeof(a) FROM `json/str-null.json`;
> +--+
> |  EXPR$0  |
> +--+
> | VARCHAR  |
> | NULL    |
> +--+
> 
> Finally, use a CSV file without headers, so that all columns are returned in 
> the columns[] array:
> 
> SELECT typeof(columns) FROM `csv/cust.csv`;
> +--+
> |  EXPR$0  |
> +--+
> | VARCHAR  |
> +--+
> 
> We know that the three "VARCHAR" are different because we know how Drill 
> works internally. But, the output of sqlline does not express that knowledge.
> 
> Sqlline presents all data as strings, which often hides the data type and 
> other details, making lit look like things work better than they actually do. 
> You can see this by running a query against two JSON where a VarChar column 
> is missing from one of the files. Drill guesses "nullable Int",  Sqlline 
> shows the value as null, and typeof() shows the type as NULL, hiding the fact 
> that there is actually a schema conflict (schema change) lurking in the data 
> that manifests only if, say, you sort the data.
> 
> Bottom line: it seems that, at present, there isn't a good way at present 
> (short of writing some Java code that uses the native Drill API) to get the 
> actually, detailed type of a column with both data type and cardinality 
> ("mode").
> 
> 
> So, would be great when explaining Drill concepts, if there was a clean 
> non-code way to show people the actual structure of the data. (Yep, I know 
> Drill is open source and welcomes contributions, so I'll try to offer a 
> solution when I get time...)
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Thursday, April 26, 2018, 10:08:04 AM PDT, Rob Wu  
>wrote:  
> 
> Hi Paul,
> 
> You could also use DESCRIBE (https://drill.apache.org/docs/describe/).
> 
> 0: jdbc:drill:drillbit=localhost:31010> describe
> `hive.default`.`integer_table`
> . . . . . . . . . . . . . . . . . . . > ;
> +--++--+
> | COLUMN_NAME  |    DATA_TYPE      | IS_NULLABLE  |
> +--++--+
> | keycolumn    | CHARACTER VARYING  | YES          |
> | column1      | INTEGER            | YES          |
> +--++
> 
> Best regards,
> 
> Rob
> 
> On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish 
> wrote:
> 
>> Hey Paul,
>> 
>> You could use the typeof() function for this purpose. It takes a single
>> parameter - the column name.
>> 
>> For example:
>>> select typeof(c_current_cdemo_sk) from customer limit 1;
>> +-+
>> | EXPR$0  |
>> +-+
>> | BIGINT  |
>> +-+
>> 1 row selected (0.472 seconds)
>> 
>> 
>> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers 
>> wrote:
>> 
>>> Hi All,
>>> Anyone know if there is a non-code way to display the data types of
>>> columns returned from a Drill query? Sqlline appears to only show the
>>> column names and values. The same is true of the Drill web console.
>>> The EXPLAIN PLAN FOR ... command shows the query plan, but not type
>> (which
>>> are only known at run time.) Is there a statement, system table or some
>>> other trick to display column types in, say, Sqlline?
>>> In the past, I've gotten the types by using unit test style code. But,
>>> that is not to handy for use as an example for non-developers...
>>> Thanks,
>>> - Paul
>>> 
>>> 
>> 
  

Re: Display column data type without code

2018-04-28 Thread Charles Givre
I’d like to weigh in here, but this would be EXTREMELY useful.  When I was 
trying to write connectors to enable various BI tools to connect to Drill, such 
as SQLPad and Metabase, the inability to get information about how drill 
interprets the data was really difficult to get around.  Just me .02. 

> On Apr 28, 2018, at 18:05, Paul Rogers  wrote:
> 
> Hi Rob,
> 
> Thanks for the suggestion. While this works for Hive (as you showed), it does 
> not work for CSV files:
> 
> DESCRIBE `csvh/cust.csvh`;
> +--++--+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> +--++--+
> 
> The typeof() function is handy, but does not report the "is nullable" (or 
> repeated) "mode" of a column, and it loses the data type if a value is null. 
> The following CSV file (with headers) uses non-nullable VARCHAR columns:
> 
> SELECT typeof(custId) FROM `csvh/cust.csvh`;
> +--+
> |  EXPR$0  |
> +--+
> | VARCHAR  |
> +--+
> 
> Now, do something similar with JSON which uses a (nullable) VARCHAR:
> 
> SELECT typeof(a) FROM `json/str-null.json`;
> +--+
> |  EXPR$0  |
> +--+
> | VARCHAR  |
> | NULL |
> +--+
> 
> Finally, use a CSV file without headers, so that all columns are returned in 
> the columns[] array:
> 
> SELECT typeof(columns) FROM `csv/cust.csv`;
> +--+
> |  EXPR$0  |
> +--+
> | VARCHAR  |
> +--+
> 
> We know that the three "VARCHAR" are different because we know how Drill 
> works internally. But, the output of sqlline does not express that knowledge.
> 
> Sqlline presents all data as strings, which often hides the data type and 
> other details, making lit look like things work better than they actually do. 
> You can see this by running a query against two JSON where a VarChar column 
> is missing from one of the files. Drill guesses "nullable Int",  Sqlline 
> shows the value as null, and typeof() shows the type as NULL, hiding the fact 
> that there is actually a schema conflict (schema change) lurking in the data 
> that manifests only if, say, you sort the data.
> 
> Bottom line: it seems that, at present, there isn't a good way at present 
> (short of writing some Java code that uses the native Drill API) to get the 
> actually, detailed type of a column with both data type and cardinality 
> ("mode").
> 
> 
> So, would be great when explaining Drill concepts, if there was a clean 
> non-code way to show people the actual structure of the data. (Yep, I know 
> Drill is open source and welcomes contributions, so I'll try to offer a 
> solution when I get time...)
> 
> Thanks,
> - Paul
> 
> 
> 
>On Thursday, April 26, 2018, 10:08:04 AM PDT, Rob Wu  
> wrote:  
> 
> Hi Paul,
> 
> You could also use DESCRIBE (https://drill.apache.org/docs/describe/).
> 
> 0: jdbc:drill:drillbit=localhost:31010> describe
> `hive.default`.`integer_table`
> . . . . . . . . . . . . . . . . . . . > ;
> +--++--+
> | COLUMN_NAME  |DATA_TYPE  | IS_NULLABLE  |
> +--++--+
> | keycolumn| CHARACTER VARYING  | YES  |
> | column1  | INTEGER| YES  |
> +--++
> 
> Best regards,
> 
> Rob
> 
> On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish 
> wrote:
> 
>> Hey Paul,
>> 
>> You could use the typeof() function for this purpose. It takes a single
>> parameter - the column name.
>> 
>> For example:
>>> select typeof(c_current_cdemo_sk) from customer limit 1;
>> +-+
>> | EXPR$0  |
>> +-+
>> | BIGINT  |
>> +-+
>> 1 row selected (0.472 seconds)
>> 
>> 
>> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers 
>> wrote:
>> 
>>> Hi All,
>>> Anyone know if there is a non-code way to display the data types of
>>> columns returned from a Drill query? Sqlline appears to only show the
>>> column names and values. The same is true of the Drill web console.
>>> The EXPLAIN PLAN FOR ... command shows the query plan, but not type
>> (which
>>> are only known at run time.) Is there a statement, system table or some
>>> other trick to display column types in, say, Sqlline?
>>> In the past, I've gotten the types by using unit test style code. But,
>>> that is not to handy for use as an example for non-developers...
>>> Thanks,
>>> - Paul
>>> 
>>> 
>> 



Re: Display column data type without code

2018-04-28 Thread Paul Rogers
Hi Rob,

Thanks for the suggestion. While this works for Hive (as you showed), it does 
not work for CSV files:

DESCRIBE `csvh/cust.csvh`;
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
+--++--+

The typeof() function is handy, but does not report the "is nullable" (or 
repeated) "mode" of a column, and it loses the data type if a value is null. 
The following CSV file (with headers) uses non-nullable VARCHAR columns:

SELECT typeof(custId) FROM `csvh/cust.csvh`;
+--+
|  EXPR$0  |
+--+
| VARCHAR  |
+--+

Now, do something similar with JSON which uses a (nullable) VARCHAR:

SELECT typeof(a) FROM `json/str-null.json`;
+--+
|  EXPR$0  |
+--+
| VARCHAR  |
| NULL     |
+--+

Finally, use a CSV file without headers, so that all columns are returned in 
the columns[] array:

SELECT typeof(columns) FROM `csv/cust.csv`;
+--+
|  EXPR$0  |
+--+
| VARCHAR  |
+--+

We know that the three "VARCHAR" are different because we know how Drill works 
internally. But, the output of sqlline does not express that knowledge.

Sqlline presents all data as strings, which often hides the data type and other 
details, making lit look like things work better than they actually do. You can 
see this by running a query against two JSON where a VarChar column is missing 
from one of the files. Drill guesses "nullable Int",  Sqlline shows the value 
as null, and typeof() shows the type as NULL, hiding the fact that there is 
actually a schema conflict (schema change) lurking in the data that manifests 
only if, say, you sort the data.

Bottom line: it seems that, at present, there isn't a good way at present 
(short of writing some Java code that uses the native Drill API) to get the 
actually, detailed type of a column with both data type and cardinality 
("mode").


So, would be great when explaining Drill concepts, if there was a clean 
non-code way to show people the actual structure of the data. (Yep, I know 
Drill is open source and welcomes contributions, so I'll try to offer a 
solution when I get time...)

Thanks,
- Paul

 

On Thursday, April 26, 2018, 10:08:04 AM PDT, Rob Wu  
wrote:  
 
 Hi Paul,

You could also use DESCRIBE (https://drill.apache.org/docs/describe/).

0: jdbc:drill:drillbit=localhost:31010> describe
`hive.default`.`integer_table`
. . . . . . . . . . . . . . . . . . . > ;
+--++--+
| COLUMN_NAME  |    DATA_TYPE      | IS_NULLABLE  |
+--++--+
| keycolumn    | CHARACTER VARYING  | YES          |
| column1      | INTEGER            | YES          |
+--++

Best regards,

Rob

On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish 
wrote:

> Hey Paul,
>
> You could use the typeof() function for this purpose. It takes a single
> parameter - the column name.
>
> For example:
> > select typeof(c_current_cdemo_sk) from customer limit 1;
> +-+
> | EXPR$0  |
> +-+
> | BIGINT  |
> +-+
> 1 row selected (0.472 seconds)
>
>
> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers 
> wrote:
>
> > Hi All,
> > Anyone know if there is a non-code way to display the data types of
> > columns returned from a Drill query? Sqlline appears to only show the
> > column names and values. The same is true of the Drill web console.
> > The EXPLAIN PLAN FOR ... command shows the query plan, but not type
> (which
> > are only known at run time.) Is there a statement, system table or some
> > other trick to display column types in, say, Sqlline?
> > In the past, I've gotten the types by using unit test style code. But,
> > that is not to handy for use as an example for non-developers...
> > Thanks,
> > - Paul
> >
> >
>
  

Re: Display column data type without code

2018-04-26 Thread Paul Rogers
Thanks!

A it turns out, typeof() is not documented on the Apache Drill website which is 
why I was confused. Maybe we should document it.

Just tried typeof(). Turns out it only returns the type, not the nullability or 
repeated mode. Is there a separate function to get the "mode" information?

- Paul

 

On Wednesday, April 25, 2018, 10:09:34 PM PDT, Aman Sinha 
 wrote:  
 
 You can do it through SQL using typeof() function.  Since there is no
global schema, Drill evaluates this for each row.

0: jdbc:drill:drillbit=10.10.101.41> select n_name, typeof(n_name) as
name_type, n_nationkey, typeof(n_nationkey) as nationkey_type from
cp.`tpch/nation.parquet` limit 2;

*+++--+-+*

*| **  n_name  ** | **name_type ** | **n_nationkey ** | **nationkey_type **
|*

*+++--+-+*

*| *ALGERIA  * | *VARCHAR  * | *0          * | *INT            * |*

*| *ARGENTINA * | *VARCHAR  * | *1          * | *INT            * |*

*+++--+-+*


On Wed, Apr 25, 2018 at 9:22 PM, Paul Rogers 
wrote:

> Hi All,
> Anyone know if there is a non-code way to display the data types of
> columns returned from a Drill query? Sqlline appears to only show the
> column names and values. The same is true of the Drill web console.
> The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which
> are only known at run time.) Is there a statement, system table or some
> other trick to display column types in, say, Sqlline?
> In the past, I've gotten the types by using unit test style code. But,
> that is not to handy for use as an example for non-developers...
> Thanks,
> - Paul
>
>
  

Re: Display column data type without code

2018-04-26 Thread Rob Wu
Hi Paul,

You could also use DESCRIBE (https://drill.apache.org/docs/describe/).

0: jdbc:drill:drillbit=localhost:31010> describe
`hive.default`.`integer_table`
. . . . . . . . . . . . . . . . . . . > ;
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| keycolumn| CHARACTER VARYING  | YES  |
| column1  | INTEGER| YES  |
+--++

Best regards,

Rob

On Wed, Apr 25, 2018 at 10:12 PM, Abhishek Girish 
wrote:

> Hey Paul,
>
> You could use the typeof() function for this purpose. It takes a single
> parameter - the column name.
>
> For example:
> > select typeof(c_current_cdemo_sk) from customer limit 1;
> +-+
> | EXPR$0  |
> +-+
> | BIGINT  |
> +-+
> 1 row selected (0.472 seconds)
>
>
> On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers 
> wrote:
>
> > Hi All,
> > Anyone know if there is a non-code way to display the data types of
> > columns returned from a Drill query? Sqlline appears to only show the
> > column names and values. The same is true of the Drill web console.
> > The EXPLAIN PLAN FOR ... command shows the query plan, but not type
> (which
> > are only known at run time.) Is there a statement, system table or some
> > other trick to display column types in, say, Sqlline?
> > In the past, I've gotten the types by using unit test style code. But,
> > that is not to handy for use as an example for non-developers...
> > Thanks,
> > - Paul
> >
> >
>


Re: Display column data type without code

2018-04-25 Thread Abhishek Girish
Hey Paul,

You could use the typeof() function for this purpose. It takes a single
parameter - the column name.

For example:
> select typeof(c_current_cdemo_sk) from customer limit 1;
+-+
| EXPR$0  |
+-+
| BIGINT  |
+-+
1 row selected (0.472 seconds)


On Wed, Apr 25, 2018 at 9:23 PM Paul Rogers 
wrote:

> Hi All,
> Anyone know if there is a non-code way to display the data types of
> columns returned from a Drill query? Sqlline appears to only show the
> column names and values. The same is true of the Drill web console.
> The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which
> are only known at run time.) Is there a statement, system table or some
> other trick to display column types in, say, Sqlline?
> In the past, I've gotten the types by using unit test style code. But,
> that is not to handy for use as an example for non-developers...
> Thanks,
> - Paul
>
>


Re: Display column data type without code

2018-04-25 Thread Aman Sinha
You can do it through SQL using typeof() function.  Since there is no
global schema, Drill evaluates this for each row.

0: jdbc:drill:drillbit=10.10.101.41> select n_name, typeof(n_name) as
name_type, n_nationkey, typeof(n_nationkey) as nationkey_type from
cp.`tpch/nation.parquet` limit 2;

*+++--+-+*

*| **  n_name  ** | **name_type ** | **n_nationkey ** | **nationkey_type **
|*

*+++--+-+*

*| *ALGERIA   * | *VARCHAR   * | *0   * | *INT* |*

*| *ARGENTINA * | *VARCHAR   * | *1   * | *INT* |*

*+++--+-+*


On Wed, Apr 25, 2018 at 9:22 PM, Paul Rogers 
wrote:

> Hi All,
> Anyone know if there is a non-code way to display the data types of
> columns returned from a Drill query? Sqlline appears to only show the
> column names and values. The same is true of the Drill web console.
> The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which
> are only known at run time.) Is there a statement, system table or some
> other trick to display column types in, say, Sqlline?
> In the past, I've gotten the types by using unit test style code. But,
> that is not to handy for use as an example for non-developers...
> Thanks,
> - Paul
>
>


Display column data type without code

2018-04-25 Thread Paul Rogers
Hi All,
Anyone know if there is a non-code way to display the data types of columns 
returned from a Drill query? Sqlline appears to only show the column names and 
values. The same is true of the Drill web console.
The EXPLAIN PLAN FOR ... command shows the query plan, but not type (which are 
only known at run time.) Is there a statement, system table or some other trick 
to display column types in, say, Sqlline?
In the past, I've gotten the types by using unit test style code. But, that is 
not to handy for use as an example for non-developers...
Thanks,
- Paul