FYI: Switchover to Zulu OpenJDK 7 build on dist-test slaves

2017-11-21 Thread Todd Lipcon
Hey folks,

We recently rebuilt the docker image used to run dist-test slaves and that
caused it to upgrade to OpenJDK 7u151 from the openjdk-r PPA for Ubuntu
14.04. Apparently that build has some bug, which caused Hadoop unit tests
to leave a bunch of zombie processes spinning 100% CPU (and their tests to
time out).

Unfortunately the old version of OpenJDK that worked fine no longer seems
to be published on the PPA, or really anywhere on the web. So, we've
switched over to Azul System's "Zulu" binary package for OpenJDK (from
https://www.azul.com/downloads/zulu/zulu-linux/ ). The new default 'java'
on the path is:


# java -version
openjdk version "1.7.0_161"
OpenJDK Runtime Environment (Zulu 7.21.0.3-linux64) (build 1.7.0_161-b14)
OpenJDK 64-Bit Server VM (Zulu 7.21.0.3-linux64) (build 24.161-b14, mixed
mode)

For Kudu this shouldn't make a huge difference since our Java tests don't
run on dist-test, except for potentially the hms_client-test (which starts
the HMS). So, I'll keep an eye on this one as the slaves transition over to
the new image.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: INT128 Column Support Interest

2017-11-21 Thread Grant Henke
>
> I'm somewhat against such a configuration. This being a server-side
> configuration results in Kudu deployments in different environments having
> different sets of available types, which seems very difficult for
> downstream users to deal with.


Yeah I agree. I am not super into the idea.

Even though "least common denominator" kind
> of sucks, it's also not a bad policy for software that aims to be part of a
> pretty diverse ecosystem.


I think because Kudu is generally the "bottom" layer it would be best to
build
new features/types from the bottom up where possible. As opposed to always
playing catchup in the ecosystem. That said, I think thats only true given
there is
interest or demand for the feature or data type. It doesn't look like that
demand exists in this case though.

I think without clear user demand for >28 digits it's just not worth the
> complexity.


Agreed. Not much response here so we should drop this for now.

 That's a good point. However, I'm guessing that users are more likely to

intuitively know that "9 digits is enough" more easily than they will know
> that "64 bits is enough". In my experience people underestimate the range
> of 64-bit integers and might choose INT128 if available even if they have
> no need for anywhere near that range


That makes sense. Instead of supporting INT128 for larger ranges,
if there is demand for more digits we could add support for decimal
precisions 39 to 77 with internal INT256 support (or VarInt).


On Mon, Nov 20, 2017 at 6:51 PM, Todd Lipcon  wrote:

> On Mon, Nov 20, 2017 at 1:12 PM, Grant Henke  wrote:
>
> > Thank you for the feedback. Below are some responses.
> >
> > Do we have a compatible SQL type to map this to in Spark SQL, Impala,
> > > Presto, etc? What type would we map to in Java?
> >
> >
> > In Java we would Map to a BigInteger. Their isn't a perfectly natural
> > mapping for SQL that I know of. It has been mentioned in the past that we
> > could have server side flags to disable/enable the ability to create
> > columns of certain types to prevent users from creating tables that are
> not
> > readable by certain integrations. This problem exists today with the
> BINARY
> > column type.
> >
>
> I'm somewhat against such a configuration. This being a server-side
> configuration results in Kudu deployments in different environments having
> different sets of available types, which seems very difficult for
> downstream users to deal with. Even though "least common denominator" kind
> of sucks, it's also not a bad policy for software that aims to be part of a
> pretty diverse ecosystem.
>
>
>
> >
> > > Why not just _not_ expose it and only expose decimal.
> >
> >
> > Technically decimal only supports 28 9's where INT128 can support
> slightly
> > larger numbers. Their may also be more overhead dealing with a decimal
> > type. Though I am not positive about that.
> >
>
> I think without clear user demand for >28 digits it's just not worth the
> complexity.
>
>
> >
> > Encoders: like Dan mentioned, it seems like we might not be able to do a
> > > very efficient job of encoding these very large integers. Stuff like
> > > bitshuffle, SIMD bitpacking, etc, isn't really designed for such large
> > > values. So, I'm a little afraid that we'll end up only with PLAIN and
> > > people will be upset with the storage overhead and performance.
> >
> >
> >  Aren't we going to need efficient encodings in order to make decimal
> work
> > > well, anyway?
> >
> >
> > We will need to ensure performant encoding exists for INT128 to make
> > decimals with a precisions >= 18 work well anyway. We should likely have
> > parity
> > with the other integer types to reduce any confusion about differing
> > precisions having different encoding considerations. Although Presto
> > documents that precision >= 18 are slower than the others. We could do
> > something similar and follow on with improvements.
> >
> > In the current int128 internal patch I know that the RLE doesn't work for
> > int128. I don't have a lot of background on Kudu's encoding details, so
> > investigating encodings further is one of my next steps.
> >
>
> That's a good point. However, I'm guessing that users are more likely to
> intuitively know that "9 digits is enough" more easily than they will know
> that "64 bits is enough". In my experience people underestimate the range
> of 64-bit integers and might choose INT128 if available even if they have
> no need for anywhere near that range.
>
> -Todd
>
>
> >
> > On Thu, Nov 16, 2017 at 5:30 PM, Dan Burkert 
> > wrote:
> >
> > > Aren't we going to need efficient encodings in order to make decimal
> work
> > > well, anyway?
> > >
> > > - Dan
> > >
> > > On Thu, Nov 16, 2017 at 2:54 PM, Todd Lipcon 
> wrote:
> > >
> > >> On Thu, Nov 16, 2017 at 2:28 PM, Dan Burkert 
> > >> wrote:
> > >>
> > >> > I think it would be useful.  As far as I've seen the main costs in
> > >> > carrying data types are in writing performant encoders, and updating
> 

Re: Flaky tests?

2017-11-21 Thread Todd Lipcon
On Tue, Nov 21, 2017 at 10:13 AM, Alexey Serbin 
wrote:

> I'll take a look at delete_table-itest (at least I have had a patch in
> review for one flake there for a long time).
>
> BTW, it would be much better if it were possible to see the type of failed
> build in the dashboard (as it was prior to quasar).  Is the type of a build
> something inherently impossible to expose from quasar?
>

I think it should be possible by just setting the BUILD_ID environment
variable appropriate before reporting the test result. That information
should be available in the enviornment as $BUILD_TYPE or somesuch. I think
Ed is out this week but maybe he can take a look at this when he gets back?

-Todd


>
>
> Best regards,
>
> Alexey
>
>
> On 11/20/17 11:50 AM, Todd Lipcon wrote:
>
>> Hey folks,
>>
>> It seems some of our tests have gotten pretty flaky lately again. Some of
>> it is likely due to churn in test infrastructure (running on a different
>> VM
>> type now I think) but it makes me a little nervous to go into the 1.6
>> release with some tests at 5%+ flaky.
>>
>> Can we get some volunteers to triage the top couple most flaky? Note that
>> "triage" doesn't necessarily mean "fix" -- just want to investigate to the
>> point that we can decide it's likely to be a test issue or known existing
>> issue rather than a regression before the release.
>>
>> I'll volunteer to look at consensus_peers-itests (the top most flaky one).
>>
>> -Todd
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Flaky tests?

2017-11-21 Thread Alexey Serbin
I'll take a look at delete_table-itest (at least I have had a patch in 
review for one flake there for a long time).


BTW, it would be much better if it were possible to see the type of 
failed build in the dashboard (as it was prior to quasar).  Is the type 
of a build something inherently impossible to expose from quasar?



Best regards,

Alexey

On 11/20/17 11:50 AM, Todd Lipcon wrote:

Hey folks,

It seems some of our tests have gotten pretty flaky lately again. Some of
it is likely due to churn in test infrastructure (running on a different VM
type now I think) but it makes me a little nervous to go into the 1.6
release with some tests at 5%+ flaky.

Can we get some volunteers to triage the top couple most flaky? Note that
"triage" doesn't necessarily mean "fix" -- just want to investigate to the
point that we can decide it's likely to be a test issue or known existing
issue rather than a regression before the release.

I'll volunteer to look at consensus_peers-itests (the top most flaky one).

-Todd