random query generator patch submit request

2016-11-30 Thread Michael Brown
Hello,

This is a patch to the random query generator, not affected by GVO:

https://gerrit.cloudera.org/#/c/5034/

Since GVO doesn't touch this code path, and Jenkins executors are precious,
could a Committer please verify/submit this change directly?

Thanks


Re: random query generator patch submit request

2016-11-30 Thread Sailesh Mukil
Done.

On Wed, Nov 30, 2016 at 8:07 AM, Michael Brown  wrote:

> Hello,
>
> This is a patch to the random query generator, not affected by GVO:
>
> https://gerrit.cloudera.org/#/c/5034/
>
> Since GVO doesn't touch this code path, and Jenkins executors are precious,
> could a Committer please verify/submit this change directly?
>
> Thanks
>


Re: random query generator patch submit request

2016-11-30 Thread Michael Brown
Thanks!

While you're at it, asf/master needs to be brought up to date with
asf_gerrit/master:

* 585ed5a - (asf_gerrit/master) IMPALA-4450: qgen: use string concatenation
operator for postgres queries (7 minutes ago) 
* 0f62bf3 - IMPALA-4550: Fix CastExpr analysis for substituted slots (5
hours ago) 
* 9f497ba - IMPALA-2890: Support ALTER TABLE statements for Kudu tables (12
hours ago) 
* 90bf40d - IMPALA-4553: ntpd must be synchronized for kudu to start. (16
hours ago) 
* 2680580 - (asf/master) IMPALA-4512: Add a script that builds Impala on
stock Ubuntu 14.04. (18 hours ago) 


On Wed, Nov 30, 2016 at 8:23 AM, Sailesh Mukil  wrote:

> Done.
>
> On Wed, Nov 30, 2016 at 8:07 AM, Michael Brown  wrote:
>
> > Hello,
> >
> > This is a patch to the random query generator, not affected by GVO:
> >
> > https://gerrit.cloudera.org/#/c/5034/
> >
> > Since GVO doesn't touch this code path, and Jenkins executors are
> precious,
> > could a Committer please verify/submit this change directly?
> >
> > Thanks
> >
>


Re: random query generator patch submit request

2016-11-30 Thread Sailesh Mukil
Just did a push_to_asf now, so it should be in sync.

On Wed, Nov 30, 2016 at 8:30 AM, Michael Brown  wrote:

> Thanks!
>
> While you're at it, asf/master needs to be brought up to date with
> asf_gerrit/master:
>
> * 585ed5a - (asf_gerrit/master) IMPALA-4450: qgen: use string concatenation
> operator for postgres queries (7 minutes ago) 
> * 0f62bf3 - IMPALA-4550: Fix CastExpr analysis for substituted slots (5
> hours ago) 
> * 9f497ba - IMPALA-2890: Support ALTER TABLE statements for Kudu tables (12
> hours ago) 
> * 90bf40d - IMPALA-4553: ntpd must be synchronized for kudu to start. (16
> hours ago) 
> * 2680580 - (asf/master) IMPALA-4512: Add a script that builds Impala on
> stock Ubuntu 14.04. (18 hours ago) 
>
>
> On Wed, Nov 30, 2016 at 8:23 AM, Sailesh Mukil 
> wrote:
>
> > Done.
> >
> > On Wed, Nov 30, 2016 at 8:07 AM, Michael Brown 
> wrote:
> >
> > > Hello,
> > >
> > > This is a patch to the random query generator, not affected by GVO:
> > >
> > > https://gerrit.cloudera.org/#/c/5034/
> > >
> > > Since GVO doesn't touch this code path, and Jenkins executors are
> > precious,
> > > could a Committer please verify/submit this change directly?
> > >
> > > Thanks
> > >
> >
>


[Toolchain-CR] Fix build failures from git clean

2016-11-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5285

Change subject: Fix build failures from git clean
..

Fix build failures from git clean

We've seen build failures with the following error, even when git clean
is run from that directory and the directory is a valid git repo.

  fatal: '/path/to/toolchain' is outside repository

The fix is to avoid providing the path, which seems to make git clean
happier.

Change-Id: I4cbc9b74e43dcde2f08a92a1ec94a4bd74bfd416
---
M functions.sh
1 file changed, 5 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Toolchain refs/changes/85/5285/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4cbc9b74e43dcde2f08a92a1ec94a4bd74bfd416
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Toolchain-CR] Fix build failures from git clean

2016-11-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Fix build failures from git clean
..


Patch Set 1:

I'm currently testing this out, but it looks like this got it past where it was 
failing earlier.

-- 
To view, visit http://gerrit.cloudera.org:8080/5285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4cbc9b74e43dcde2f08a92a1ec94a4bd74bfd416
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Toolchain-CR] Fix build failures from git clean

2016-11-30 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Fix build failures from git clean
..


Patch Set 1: Code-Review+2

thanks!

-- 
To view, visit http://gerrit.cloudera.org:8080/5285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4cbc9b74e43dcde2f08a92a1ec94a4bd74bfd416
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


Fwd: Adding some guard rails to Kudu

2016-11-30 Thread Todd Lipcon
Hey Impala folks,

Just FYI, I started the below thread on the Kudu lists about adding some
limits/guard rails to various dimensions of Kudu data/metadata. Please take
a look from the Impala perspective and let us know if you foresee any
issues with these limits.

Just to repeat one thing: I know many SQL workloads require more than 300
columns in a table, but right now Kudu isn't great in that realm, so we're
setting the limits conservatively. The idea is that over time as we improve
test coverage we'll raise the limits.

-Todd

-- Forwarded message --
From: Todd Lipcon 
Date: Wed, Nov 30, 2016 at 3:30 PM
Subject: Re: Adding some guard rails to Kudu
To: u...@kudu.apache.org, dev 


BTW I filed a JIRA here and started linking related issues to it:
https://issues.apache.org/jira/browse/KUDU-1775


On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon  wrote:

> Hey folks,
>
> I've started working on a few patches to add "guard rails" to various
> user-specified dimensions in Kudu. In particular, I'm planning to add
> limits to the following:
>
> - max number of columns in a table (proposal: 300)
> - max replication factor (proposal: 7)
> - max table name or column name length (proposal: 256)
> - max size of a binary/string column cell value (proposal: 64kb)
>
> The reasoning is that, even though in some cases we don't know a specific
> issue that will happen outside these limits, we've done very little testing
> (and have no automated testing) outside of these ranges. In some cases, we
> do know that there is a certain threshold that will cause a big problem (eg
> large cell sizes can cause tablet servers to crash). In other cases, it's
> just "unknown territory".
>
> In all cases, I'm planning on making the limits overridable via an
> "unsafe" configuration flag. That means that a user can run with
> "--unlock_unsafe_flags --max_identifier_length=1000" if they want to, but
> they're explicitly accepting some risk that they're entering untested
> territory.
>
> Of course, in all cases, if we hear that there are people who are bumping
> the maxes higher than the defaults and having good results, we can consider
> raising the maximum, but I think it's smarter to start conservatively low
> and raise later as we increase test coverage. Also, I'm sure down the road
> we'll add features such as BLOB support or sparse column support, and at
> that time we can remove the corresponding guard rails.
>
> I'm sending this note to both user@ and dev@ to solicit feedback. Are
> there any other dimensions people can think of where we should probably add
> guard-rails? Is anyone out there already outside of the above ranges and
> can make a case that we're being too conservative?
>
> Thanks
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Adding some guard rails to Kudu

2016-11-30 Thread Matthew Jacobs
1) I think that makes sense, though we need to know what the error
conditions are, when those errors occur (e.g. at table creation, add
column, writing data) and need tests to validate the expected negative
cases. I can certainly guess, though I'd like for the affected API
calls w/ new expected behavior to be documented somewhere so we can
make changes accordingly.
2) Does this mean you'll test up to these limits?

Thanks

On Wed, Nov 30, 2016 at 3:33 PM, Todd Lipcon  wrote:
> Hey Impala folks,
>
> Just FYI, I started the below thread on the Kudu lists about adding some
> limits/guard rails to various dimensions of Kudu data/metadata. Please take
> a look from the Impala perspective and let us know if you foresee any
> issues with these limits.
>
> Just to repeat one thing: I know many SQL workloads require more than 300
> columns in a table, but right now Kudu isn't great in that realm, so we're
> setting the limits conservatively. The idea is that over time as we improve
> test coverage we'll raise the limits.
>
> -Todd
>
> -- Forwarded message --
> From: Todd Lipcon 
> Date: Wed, Nov 30, 2016 at 3:30 PM
> Subject: Re: Adding some guard rails to Kudu
> To: u...@kudu.apache.org, dev 
>
>
> BTW I filed a JIRA here and started linking related issues to it:
> https://issues.apache.org/jira/browse/KUDU-1775
>
>
> On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon  wrote:
>
>> Hey folks,
>>
>> I've started working on a few patches to add "guard rails" to various
>> user-specified dimensions in Kudu. In particular, I'm planning to add
>> limits to the following:
>>
>> - max number of columns in a table (proposal: 300)
>> - max replication factor (proposal: 7)
>> - max table name or column name length (proposal: 256)
>> - max size of a binary/string column cell value (proposal: 64kb)
>>
>> The reasoning is that, even though in some cases we don't know a specific
>> issue that will happen outside these limits, we've done very little testing
>> (and have no automated testing) outside of these ranges. In some cases, we
>> do know that there is a certain threshold that will cause a big problem (eg
>> large cell sizes can cause tablet servers to crash). In other cases, it's
>> just "unknown territory".
>>
>> In all cases, I'm planning on making the limits overridable via an
>> "unsafe" configuration flag. That means that a user can run with
>> "--unlock_unsafe_flags --max_identifier_length=1000" if they want to, but
>> they're explicitly accepting some risk that they're entering untested
>> territory.
>>
>> Of course, in all cases, if we hear that there are people who are bumping
>> the maxes higher than the defaults and having good results, we can consider
>> raising the maximum, but I think it's smarter to start conservatively low
>> and raise later as we increase test coverage. Also, I'm sure down the road
>> we'll add features such as BLOB support or sparse column support, and at
>> that time we can remove the corresponding guard rails.
>>
>> I'm sending this note to both user@ and dev@ to solicit feedback. Are
>> there any other dimensions people can think of where we should probably add
>> guard-rails? Is anyone out there already outside of the above ranges and
>> can make a case that we're being too conservative?
>>
>> Thanks
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera


Re: Adding some guard rails to Kudu

2016-11-30 Thread Todd Lipcon
On Wed, Nov 30, 2016 at 3:52 PM, Matthew Jacobs  wrote:

> 1) I think that makes sense, though we need to know what the error
> conditions are, when those errors occur (e.g. at table creation, add
> column, writing data) and need tests to validate the expected negative
> cases. I can certainly guess, though I'd like for the affected API
> calls w/ new expected behavior to be documented somewhere so we can
> make changes accordingly.
>

For the schema-related ones, they'll behave the same as any other invalid
schema does today (eg trying to use the same column name twice, or using an
existing table name, etc).

In terms of API docs, I was hoping that the documentation can be general
enough about the types of exceptions thrown for general categories rather
than having to translate every bit of validation logic into English on the
API docs.

In other words, we should document that createTable() throws an exception
if the schema was invalid, but I don't think we need to re-document all of
the ways in which a schema can be invalid in the API docs, do we? I think
more general user-facing documentation is probably the appropriate place.

Test-wise, I agree that some coverage end-to-end from Impala would be nice.
We're adding tests that go through our API to validate it, but validating
that the user-exposed error is reasonable too is of course a good idea.

2) Does this mean you'll test up to these limits?
>

Yea, that's the long-term intention and would be ideal. However, for now,
I'm not necessarily guaranteeing that we've got automated testing up to
these limits today in all combinations. For example, 300 columns works, and
64kb cells works, but maybe we'd have an issue with a table where all 300
columns contain 64kb cells in every row.

So, the hope with this patch series isn't to 100% constrain users such that
they can never get into a less-tested area. But hopefully we've cut out 95%
of the space here and prevented some users from shooting themselves in the
foot. For example, we recently had a case where a bug in the user
application mis-parsed some input data and tried to insert a 20MB cell,
which ended up causing an outage, and this relatively simplistic patch
would have prevented that.

Put another way, we're telling users "if you go above this range, you may
have problems" but not guaranteeing the logical inverse "if you stay below
this range, you will never have a problem." That doesn't make it less
useful, though, IMO :)

-Todd


> On Wed, Nov 30, 2016 at 3:33 PM, Todd Lipcon  wrote:
> > Hey Impala folks,
> >
> > Just FYI, I started the below thread on the Kudu lists about adding some
> > limits/guard rails to various dimensions of Kudu data/metadata. Please
> take
> > a look from the Impala perspective and let us know if you foresee any
> > issues with these limits.
> >
> > Just to repeat one thing: I know many SQL workloads require more than 300
> > columns in a table, but right now Kudu isn't great in that realm, so
> we're
> > setting the limits conservatively. The idea is that over time as we
> improve
> > test coverage we'll raise the limits.
> >
> > -Todd
> >
> > -- Forwarded message --
> > From: Todd Lipcon 
> > Date: Wed, Nov 30, 2016 at 3:30 PM
> > Subject: Re: Adding some guard rails to Kudu
> > To: u...@kudu.apache.org, dev 
> >
> >
> > BTW I filed a JIRA here and started linking related issues to it:
> > https://issues.apache.org/jira/browse/KUDU-1775
> >
> >
> > On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon  wrote:
> >
> >> Hey folks,
> >>
> >> I've started working on a few patches to add "guard rails" to various
> >> user-specified dimensions in Kudu. In particular, I'm planning to add
> >> limits to the following:
> >>
> >> - max number of columns in a table (proposal: 300)
> >> - max replication factor (proposal: 7)
> >> - max table name or column name length (proposal: 256)
> >> - max size of a binary/string column cell value (proposal: 64kb)
> >>
> >> The reasoning is that, even though in some cases we don't know a
> specific
> >> issue that will happen outside these limits, we've done very little
> testing
> >> (and have no automated testing) outside of these ranges. In some cases,
> we
> >> do know that there is a certain threshold that will cause a big problem
> (eg
> >> large cell sizes can cause tablet servers to crash). In other cases,
> it's
> >> just "unknown territory".
> >>
> >> In all cases, I'm planning on making the limits overridable via an
> >> "unsafe" configuration flag. That means that a user can run with
> >> "--unlock_unsafe_flags --max_identifier_length=1000" if they want to,
> but
> >> they're explicitly accepting some risk that they're entering untested
> >> territory.
> >>
> >> Of course, in all cases, if we hear that there are people who are
> bumping
> >> the maxes higher than the defaults and having good results, we can
> consider
> >> raising the maximum, but I think it's smarter to start conservatively
> low
> >> and raise later as we incre

Re: Adding some guard rails to Kudu

2016-11-30 Thread Matthew Jacobs
On Wed, Nov 30, 2016 at 4:03 PM, Todd Lipcon  wrote:
> On Wed, Nov 30, 2016 at 3:52 PM, Matthew Jacobs  wrote:
>
>> 1) I think that makes sense, though we need to know what the error
>> conditions are, when those errors occur (e.g. at table creation, add
>> column, writing data) and need tests to validate the expected negative
>> cases. I can certainly guess, though I'd like for the affected API
>> calls w/ new expected behavior to be documented somewhere so we can
>> make changes accordingly.
>>
>
> For the schema-related ones, they'll behave the same as any other invalid
> schema does today (eg trying to use the same column name twice, or using an
> existing table name, etc).
>
> In terms of API docs, I was hoping that the documentation can be general
> enough about the types of exceptions thrown for general categories rather
> than having to translate every bit of validation logic into English on the
> API docs.
>
> In other words, we should document that createTable() throws an exception
> if the schema was invalid, but I don't think we need to re-document all of
> the ways in which a schema can be invalid in the API docs, do we? I think
> more general user-facing documentation is probably the appropriate place.

I didn't mean to imply I'd like you to redocument everything, but
rather I'd like to know what new failure modes are possible, e.g. will
KuduSession::ApplyRow or something else now fail if a cell value is >
64kb with a different error code that we didn't see before? Even
though we do attempt to handle all error statuses, if we can
reasonably expect some new error I'd like to test it. If there are
none, that's even better :)

>
> Test-wise, I agree that some coverage end-to-end from Impala would be nice.
> We're adding tests that go through our API to validate it, but validating
> that the user-exposed error is reasonable too is of course a good idea.
>
> 2) Does this mean you'll test up to these limits?
>>
>
> Yea, that's the long-term intention and would be ideal. However, for now,
> I'm not necessarily guaranteeing that we've got automated testing up to
> these limits today in all combinations. For example, 300 columns works, and
> 64kb cells works, but maybe we'd have an issue with a table where all 300
> columns contain 64kb cells in every row.
>
> So, the hope with this patch series isn't to 100% constrain users such that
> they can never get into a less-tested area. But hopefully we've cut out 95%
> of the space here and prevented some users from shooting themselves in the
> foot. For example, we recently had a case where a bug in the user
> application mis-parsed some input data and tried to insert a 20MB cell,
> which ended up causing an outage, and this relatively simplistic patch
> would have prevented that.
>
> Put another way, we're telling users "if you go above this range, you may
> have problems" but not guaranteeing the logical inverse "if you stay below
> this range, you will never have a problem." That doesn't make it less
> useful, though, IMO :)

Yeah obviously there's a huge test space, I was just wondering if it
means at least some of it will be touched.

It's good to have some reasonably defined boundaries, and we should
adjust our Kudu integration testing accordingly.

Thanks

>
> -Todd
>
>
>> On Wed, Nov 30, 2016 at 3:33 PM, Todd Lipcon  wrote:
>> > Hey Impala folks,
>> >
>> > Just FYI, I started the below thread on the Kudu lists about adding some
>> > limits/guard rails to various dimensions of Kudu data/metadata. Please
>> take
>> > a look from the Impala perspective and let us know if you foresee any
>> > issues with these limits.
>> >
>> > Just to repeat one thing: I know many SQL workloads require more than 300
>> > columns in a table, but right now Kudu isn't great in that realm, so
>> we're
>> > setting the limits conservatively. The idea is that over time as we
>> improve
>> > test coverage we'll raise the limits.
>> >
>> > -Todd
>> >
>> > -- Forwarded message --
>> > From: Todd Lipcon 
>> > Date: Wed, Nov 30, 2016 at 3:30 PM
>> > Subject: Re: Adding some guard rails to Kudu
>> > To: u...@kudu.apache.org, dev 
>> >
>> >
>> > BTW I filed a JIRA here and started linking related issues to it:
>> > https://issues.apache.org/jira/browse/KUDU-1775
>> >
>> >
>> > On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon  wrote:
>> >
>> >> Hey folks,
>> >>
>> >> I've started working on a few patches to add "guard rails" to various
>> >> user-specified dimensions in Kudu. In particular, I'm planning to add
>> >> limits to the following:
>> >>
>> >> - max number of columns in a table (proposal: 300)
>> >> - max replication factor (proposal: 7)
>> >> - max table name or column name length (proposal: 256)
>> >> - max size of a binary/string column cell value (proposal: 64kb)
>> >>
>> >> The reasoning is that, even though in some cases we don't know a
>> specific
>> >> issue that will happen outside these limits, we've done very little
>> testing
>> >> (and have no auto

Re: Adding some guard rails to Kudu

2016-11-30 Thread Todd Lipcon
BTW I am doing some testing in Impala, and it seems like Impala silently
truncates column names to 128 characters.

Even more fun is:

create table todd_test
(
int,
xyy
int)

(two long names which differ after the 128-character truncation). Results
in:

ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: Add request failed : INSERT INTO `COLUMNS_V2`
(`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES
(?,?,?,?,?)

So given the 128-character Impala limit, perhaps the Kudu limit should be
more than 256, since you're creating Kudu table names as
'::'.


On Wed, Nov 30, 2016 at 4:03 PM, Todd Lipcon  wrote:

> On Wed, Nov 30, 2016 at 3:52 PM, Matthew Jacobs  wrote:
>
>> 1) I think that makes sense, though we need to know what the error
>> conditions are, when those errors occur (e.g. at table creation, add
>> column, writing data) and need tests to validate the expected negative
>> cases. I can certainly guess, though I'd like for the affected API
>> calls w/ new expected behavior to be documented somewhere so we can
>> make changes accordingly.
>>
>
> For the schema-related ones, they'll behave the same as any other invalid
> schema does today (eg trying to use the same column name twice, or using an
> existing table name, etc).
>
> In terms of API docs, I was hoping that the documentation can be general
> enough about the types of exceptions thrown for general categories rather
> than having to translate every bit of validation logic into English on the
> API docs.
>
> In other words, we should document that createTable() throws an exception
> if the schema was invalid, but I don't think we need to re-document all of
> the ways in which a schema can be invalid in the API docs, do we? I think
> more general user-facing documentation is probably the appropriate place.
>
> Test-wise, I agree that some coverage end-to-end from Impala would be
> nice. We're adding tests that go through our API to validate it, but
> validating that the user-exposed error is reasonable too is of course a
> good idea.
>
> 2) Does this mean you'll test up to these limits?
>>
>
> Yea, that's the long-term intention and would be ideal. However, for now,
> I'm not necessarily guaranteeing that we've got automated testing up to
> these limits today in all combinations. For example, 300 columns works, and
> 64kb cells works, but maybe we'd have an issue with a table where all 300
> columns contain 64kb cells in every row.
>
> So, the hope with this patch series isn't to 100% constrain users such
> that they can never get into a less-tested area. But hopefully we've cut
> out 95% of the space here and prevented some users from shooting themselves
> in the foot. For example, we recently had a case where a bug in the user
> application mis-parsed some input data and tried to insert a 20MB cell,
> which ended up causing an outage, and this relatively simplistic patch
> would have prevented that.
>
> Put another way, we're tel

Re: Adding some guard rails to Kudu

2016-11-30 Thread Matthew Jacobs
Hm nice catch. I don't think we should do this silently... I filed
https://issues.cloudera.org/browse/IMPALA-4563
I'm not sure if we have similar behavior on db names and table names,
or if there are explicit max lengths there to consider.

On Wed, Nov 30, 2016 at 4:26 PM, Todd Lipcon  wrote:
> BTW I am doing some testing in Impala, and it seems like Impala silently
> truncates column names to 128 characters.
>
> Even more fun is:
>
> create table todd_test
> (
> int,
> xyy
> int)
>
> (two long names which differ after the 128-character truncation). Results
> in:
>
> ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
> CAUSED BY: MetaException: Add request failed : INSERT INTO `COLUMNS_V2`
> (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES
> (?,?,?,?,?)
>
> So given the 128-character Impala limit, perhaps the Kudu limit should be
> more than 256, since you're creating Kudu table names as
> '::'.
>
>
> On Wed, Nov 30, 2016 at 4:03 PM, Todd Lipcon  wrote:
>
>> On Wed, Nov 30, 2016 at 3:52 PM, Matthew Jacobs  wrote:
>>
>>> 1) I think that makes sense, though we need to know what the error
>>> conditions are, when those errors occur (e.g. at table creation, add
>>> column, writing data) and need tests to validate the expected negative
>>> cases. I can certainly guess, though I'd like for the affected API
>>> calls w/ new expected behavior to be documented somewhere so we can
>>> make changes accordingly.
>>>
>>
>> For the schema-related ones, they'll behave the same as any other invalid
>> schema does today (eg trying to use the same column name twice, or using an
>> existing table name, etc).
>>
>> In terms of API docs, I was hoping that the documentation can be general
>> enough about the types of exceptions thrown for general categories rather
>> than having to translate every bit of validation logic into English on the
>> API docs.
>>
>> In other words, we should document that createTable() throws an exception
>> if the schema was invalid, but I don't think we need to re-document all of
>> the ways in which a schema can be invalid in the API docs, do we? I think
>> more general user-facing documentation is probably the appropriate place.
>>
>> Test-wise, I agree that some coverage end-to-end from Impala would be
>> nice. We're adding tests that go through our API to validate it, but
>> validating that the user-exposed error is reasonable too is of course a
>> good idea.
>>
>> 2) Does this mean you'll test up to these limits?
>>>
>>
>> Yea, that's the long-term intention and would be ideal. However, for now,
>> I'm not necessarily guaranteeing that we've got automated testing up to
>> these limits today in all combinations. For example, 300 columns works, and
>> 64kb cells works, but maybe we'd have an issue with a table where all 300
>> columns contain 64kb cells in every row.
>>
>> So, the hope with this patch series isn't to 100% constrain users such
>> that they can never get into a less-tested area. But h