Re: Session handling with multiple drillbits

2018-09-04 Thread John Omernik
The session is the users session, not the drill bit. Since you are
connected to a specific drill bit, when you alter session it will work. Try
to use session stickiness or pinning on your HA solution and you will be
go. With my DNS round robin it picks a "connecting" drill bit and sticks to
it until the session is done. The settings apply to the drill cluster while
in distributed mode.

On Tue, Sep 4, 2018 at 3:56 PM, Joe Auty  wrote:

> Thanks for your response John!
>
> We are using Drill both in an ETL context, as well as for general
> warehouse queries. One Drill user uses store format set to Parquet while
> the other uses store format set to CSV to read and write from HDFS. We are
> currently using Kubernetes Services rather than DNS round robin, but we
> only have one drillbit in the cluster while we try to sort out this issue.
>
> I'm not clear though how alter session would work with a session with
> multiple drillbits involved? We need to ensure that the right store format
> option is set, so like I said we are using different Drill usernames and
> sessions to accommodate this, but how would alter session commands apply to
> preserve these different settings across multiple drillbits?
>
>
>
>
>
>
> John Omernik wrote on 2018-09-04 2:48 PM:
>
>> Are these ETL ish type queries?  store.format should only apply when Drill
>> is writing data, when it is reading, it uses the filenames and other hints
>> to read.
>>
>> Thus, if you do HA, say with DNS (like like in the other thread) and prior
>> to running your CREATE TABLE AS (I Am assuming this is what you are doing)
>> you can do ALTER SESSION set store.format = 'parquet'
>>
>> Instead of setting the ALTER SYSTEM, you can use ALTER SESSION so it only
>> applies to the current session, regardless of foreman.
>>
>> John
>>
>>
>> On Tue, Sep 4, 2018 at 1:00 PM, Joe Auty  wrote:
>>
>> Hello,
>>>
>>> We need to have some queries executed with store.format set to parquet
>>> and
>>> some with this option set to CSV. To date we have experimented with
>>> setting
>>> the store format for sessions controlled by using two separate user
>>> logins
>>> as a sort of context switch, but I'm wondering if the group here might
>>> have
>>> suggestions for a better way to handle this, particularly one that will
>>> scale a little better for us?
>>>
>>> The main problem we have with this approach is in introducing multiple
>>> drillbits/HA and assuring that the session and the settings we need are
>>> respected across all drillbits (whether with an HAProxy + sticky session
>>> approach or any other approach). There is a more general thread (which
>>> I've
>>> chosen not to hijack) about HA Drill from a more general standpoint, you
>>> might think of my question here as being similar, but with the need for a
>>> context switch to support multiple Drill configurations/session options.
>>>
>>> Here are the various attempts and approaches we have come up with so far.
>>> I'm wondering if you'd have any general advice as to which approach would
>>> be best for us to take, considering future plans for Drill itself. For
>>> example, if need be we can write our own plugin(s) if this is the
>>> smartest
>>> approach:
>>>
>>> - embedded the store.format option into the query itself by chaining
>>> multiple queries together separated by a comma (it appears that this
>>> doesn't work at all)
>>> - look into writing some sort of plugin to allow us to scale our current
>>> approach somehow (I realize that this is vague)
>>> - a "foreman" approach where we stick with our current approach and
>>> direct
>>> all requests to our "foreman"/master with the hope and expectation that
>>> it
>>> will farm out work to the workers/slaves
>>> - multiple clusters set with different settings
>>>
>>> Each of these approaches seems to have its pros and cons. To reiterate:
>>> what approach do you think would be the smartest and most future-proof
>>> approach for us to take?
>>>
>>> Thanks in advance!
>>>
>>>
>


Re: Session handling with multiple drillbits

2018-09-04 Thread Joe Auty

Thanks for your response John!

We are using Drill both in an ETL context, as well as for general 
warehouse queries. One Drill user uses store format set to Parquet while 
the other uses store format set to CSV to read and write from HDFS. We 
are currently using Kubernetes Services rather than DNS round robin, but 
we only have one drillbit in the cluster while we try to sort out this 
issue.


I'm not clear though how alter session would work with a session with 
multiple drillbits involved? We need to ensure that the right store 
format option is set, so like I said we are using different Drill 
usernames and sessions to accommodate this, but how would alter session 
commands apply to preserve these different settings across multiple 
drillbits?






John Omernik wrote on 2018-09-04 2:48 PM:

Are these ETL ish type queries?  store.format should only apply when Drill
is writing data, when it is reading, it uses the filenames and other hints
to read.

Thus, if you do HA, say with DNS (like like in the other thread) and prior
to running your CREATE TABLE AS (I Am assuming this is what you are doing)
you can do ALTER SESSION set store.format = 'parquet'

Instead of setting the ALTER SYSTEM, you can use ALTER SESSION so it only
applies to the current session, regardless of foreman.

John


On Tue, Sep 4, 2018 at 1:00 PM, Joe Auty  wrote:


Hello,

We need to have some queries executed with store.format set to parquet and
some with this option set to CSV. To date we have experimented with setting
the store format for sessions controlled by using two separate user logins
as a sort of context switch, but I'm wondering if the group here might have
suggestions for a better way to handle this, particularly one that will
scale a little better for us?

The main problem we have with this approach is in introducing multiple
drillbits/HA and assuring that the session and the settings we need are
respected across all drillbits (whether with an HAProxy + sticky session
approach or any other approach). There is a more general thread (which I've
chosen not to hijack) about HA Drill from a more general standpoint, you
might think of my question here as being similar, but with the need for a
context switch to support multiple Drill configurations/session options.

Here are the various attempts and approaches we have come up with so far.
I'm wondering if you'd have any general advice as to which approach would
be best for us to take, considering future plans for Drill itself. For
example, if need be we can write our own plugin(s) if this is the smartest
approach:

- embedded the store.format option into the query itself by chaining
multiple queries together separated by a comma (it appears that this
doesn't work at all)
- look into writing some sort of plugin to allow us to scale our current
approach somehow (I realize that this is vague)
- a "foreman" approach where we stick with our current approach and direct
all requests to our "foreman"/master with the hope and expectation that it
will farm out work to the workers/slaves
- multiple clusters set with different settings

Each of these approaches seems to have its pros and cons. To reiterate:
what approach do you think would be the smartest and most future-proof
approach for us to take?

Thanks in advance!





Re: Session handling with multiple drillbits

2018-09-04 Thread John Omernik
Are these ETL ish type queries?  store.format should only apply when Drill
is writing data, when it is reading, it uses the filenames and other hints
to read.

Thus, if you do HA, say with DNS (like like in the other thread) and prior
to running your CREATE TABLE AS (I Am assuming this is what you are doing)
you can do ALTER SESSION set store.format = 'parquet'

Instead of setting the ALTER SYSTEM, you can use ALTER SESSION so it only
applies to the current session, regardless of foreman.

John


On Tue, Sep 4, 2018 at 1:00 PM, Joe Auty  wrote:

> Hello,
>
> We need to have some queries executed with store.format set to parquet and
> some with this option set to CSV. To date we have experimented with setting
> the store format for sessions controlled by using two separate user logins
> as a sort of context switch, but I'm wondering if the group here might have
> suggestions for a better way to handle this, particularly one that will
> scale a little better for us?
>
> The main problem we have with this approach is in introducing multiple
> drillbits/HA and assuring that the session and the settings we need are
> respected across all drillbits (whether with an HAProxy + sticky session
> approach or any other approach). There is a more general thread (which I've
> chosen not to hijack) about HA Drill from a more general standpoint, you
> might think of my question here as being similar, but with the need for a
> context switch to support multiple Drill configurations/session options.
>
> Here are the various attempts and approaches we have come up with so far.
> I'm wondering if you'd have any general advice as to which approach would
> be best for us to take, considering future plans for Drill itself. For
> example, if need be we can write our own plugin(s) if this is the smartest
> approach:
>
> - embedded the store.format option into the query itself by chaining
> multiple queries together separated by a comma (it appears that this
> doesn't work at all)
> - look into writing some sort of plugin to allow us to scale our current
> approach somehow (I realize that this is vague)
> - a "foreman" approach where we stick with our current approach and direct
> all requests to our "foreman"/master with the hope and expectation that it
> will farm out work to the workers/slaves
> - multiple clusters set with different settings
>
> Each of these approaches seems to have its pros and cons. To reiterate:
> what approach do you think would be the smartest and most future-proof
> approach for us to take?
>
> Thanks in advance!
>


Re: Failure while reading messages from kafka

2018-09-04 Thread Khurram Faraaz
Can you please share the stack trace from drillbit.log and the version of
Kafka that you are on ?

Thanks,
Khurram

On Tue, Sep 4, 2018 at 11:39 AM, Matt  wrote:

> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_DRILL-2D6723&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=
> H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI&m=1BtkhAfyGAwpn_Qov_2S8-
> 0asOGHNYFYaxumWdyI4kE&s=57Qb0eMSL8xLEok5uB7ovkEkro0DgQwTXYnFeERXGiY&e=
>
> On Mon, Aug 27, 2018 at 12:27 PM Matt  wrote:
>
> > I have a Kafka topic with some non-JSON test messages in it, resulting in
> > errors "Error: DATA_READ ERROR: Failure while reading messages from
> kafka.
> > Recordreader was at record: 1"
> >
> > I don't seem to be able to bypass these topic messages with
> > "store.json.reader.skip_invalid_records" or even an OFFSET in the query.
> >
> > Is there a mechanism or setting I can use to query a topic and not fail
> on
> > malformed messages?
> >
>


Re: Failure while reading messages from kafka

2018-09-04 Thread Matt
https://issues.apache.org/jira/browse/DRILL-6723

On Mon, Aug 27, 2018 at 12:27 PM Matt  wrote:

> I have a Kafka topic with some non-JSON test messages in it, resulting in
> errors "Error: DATA_READ ERROR: Failure while reading messages from kafka.
> Recordreader was at record: 1"
>
> I don't seem to be able to bypass these topic messages with
> "store.json.reader.skip_invalid_records" or even an OFFSET in the query.
>
> Is there a mechanism or setting I can use to query a topic and not fail on
> malformed messages?
>


Session handling with multiple drillbits

2018-09-04 Thread Joe Auty

Hello,

We need to have some queries executed with store.format set to parquet 
and some with this option set to CSV. To date we have experimented with 
setting the store format for sessions controlled by using two separate 
user logins as a sort of context switch, but I'm wondering if the group 
here might have suggestions for a better way to handle this, 
particularly one that will scale a little better for us?


The main problem we have with this approach is in introducing multiple 
drillbits/HA and assuring that the session and the settings we need are 
respected across all drillbits (whether with an HAProxy + sticky session 
approach or any other approach). There is a more general thread (which 
I've chosen not to hijack) about HA Drill from a more general 
standpoint, you might think of my question here as being similar, but 
with the need for a context switch to support multiple Drill 
configurations/session options.


Here are the various attempts and approaches we have come up with so 
far. I'm wondering if you'd have any general advice as to which approach 
would be best for us to take, considering future plans for Drill itself. 
For example, if need be we can write our own plugin(s) if this is the 
smartest approach:


- embedded the store.format option into the query itself by chaining 
multiple queries together separated by a comma (it appears that this 
doesn't work at all)
- look into writing some sort of plugin to allow us to scale our current 
approach somehow (I realize that this is vague)
- a "foreman" approach where we stick with our current approach and 
direct all requests to our "foreman"/master with the hope and 
expectation that it will farm out work to the workers/slaves

- multiple clusters set with different settings

Each of these approaches seems to have its pros and cons. To reiterate: 
what approach do you think would be the smartest and most future-proof 
approach for us to take?


Thanks in advance!