Drill Questions (Developer)

2017-10-17 Thread Max Orelus
Hi, 

I'm not exactly sure which mail listing I'm suppose to ask these
developer questions on, so sorry if there is any inconvenience. 

I'm developing a web application that relies on Drill as its main
search/querying functionality. I've gone through the documentation, but
there's a couple things that are still unclear to me when using Drill.
If anyone a part of the core/developer team could address any of these
questions I would appreciate it.

1. From a terminal session I'm able to start Drill and start to execute
queries on the CLI. One tasks that I can do from the terminal is CREATE
A TEMPORARY TABLE name AS query; execute that and right after the
execution I'm able to query the tmp table as long as I keep the terminal
session open . 

I would like to be able to do this from a REST client, I was wondering
if there was anyway to chain SQL queries when making a request to POST
http://localhost:8047/query.json? When I submit a query via the
web-console or the REST API, the temporary table gets created, but when
I want to issue another request to the tmp_table I just created, I'm not
able to because the table at the point has already been dropped. Is
there a way to chain two queries using the REST API to execute one after
another and on return the last queries results?

2. I have streams of data being written to separate folders (folderA,
folderB, folderC) in parquet format. Each stream has common columns that
are shared across all streams, but they also have unique columns that
only apply to the particular stream. I know I'm able to query all
streams by just issuing a wildcard for the pattern of the directories
and the results will return with an extra column titled dir0, with the
reference to the directory the record came from. 

I'm wondering if there's a way to sort amongst the results that are
returned, because as of my trial and errors, I have not been able to
sort when querying across different stream schemas, only when I query
one schema at a time I'm able to sort the results. 

Is there a way to construct my query that could potentially assist with
this request?

3. Do you have examples of constructing a histogram like query against
sample data by date?

Thank you for your time.

Best regards,

--  
Max Orelus
+1 (202) 361-9946
maxore...@fastmail.com


Retrieve Column Types from Parquet File?

2017-09-23 Thread Max Orelus
Hi,

I recently started learning about using Apache Drill and I've been
trying to figure out something for a while now, but I can't seem to find
any resources that document what I'm trying to do. Essentially, I'm
trying to query the data types for each column that I have within a
Parquet file using drill. I've scanned over the documentation on the
Drill website, but nothing I have tried has worked.

I will admit that I'm not in any means a Database administrator, so this
is somewhat out of my knowledge realm. I'm a web developer that is
integrating drill querying within my applications.

So for instance, I have a parquet file that has the following columns:

name | address | age | occupation | timestamp

I would like to be able to query that parquet file and find out what
each column type is in the following manner:

| field  | type   |
|||
| name   | string |
| address| string |
| age| int|
| occupation | string |
| timestamp  | date   |

If you can point me in the right direction or potentially give me an
example of how I would write a query to output the above, I would really
appreciate it.

Thank you for your time.

Best regards,

--  
Max Orelus
+1 (202) 361-9946
maxore...@fastmail.com