Re: Parquet min/max statistics & null values

2017-11-01 Thread Lars Volker
Hi Bruno,

Parquet-mr currently doesn't support the new stats fields:
https://issues.apache.org/jira/browse/PARQUET-1025

For development purposes I used a small Python script a while ago. I pushed
it here so you can check it out if you'd like, but it's really only meant
for development purposes:
https://github.com/lekv/impala/tree/dump_parquet_stats

Cheers, Lars


On Sat, Oct 28, 2017 at 1:19 PM, Bruno Quinart <bquin...@icloud.com> wrote:

> Hi
>
> Thanks for your replies.
>
> An example query:
> SELECT * FROM reasonably_sized_partitioned_table
> WHERE partition_key_dtnr = 20171028
> AND sorted_column = 56789123456
> (same effect with an IN instead of the last equality)
>
> I checked the behavior in our cluster again and can now confirm Lars
> explanation:
> - Filtering on stats works well as from 2.9, except when there is only
> null values for the column in a parquet file
> - And that would be addressed by IMPALA-6113
> I was wrongly assuming that in some of my tables there would not have been
> enough null values to fill a parquet file for each partition.
>
> I wonder how you verify those stats in Parquet metadata. Parquet-tools
> currently does not print them.
>
> Thanks for the help
> Bruno
>
> On 27 Oct, 2017, at 12:54 AM, Lars Volker wrote:
>
> Hi Bruno,
>
> To clarify on your original observation: Column chunks that only have null
> values in them will not have their min_value and max_value fields populated
> and thus won't be skipped based on stats. I filed IMPALA-6113
> <https://issues.apache.org/jira/browse/IMPALA-6113> to track this.
> IMPALA-5061 <https://issues.apache.org/jira/browse/IMPALA-5061> added
> support to populate the null_count in statistics, allowing us to detect
> column chunks that only contain NULLs. We should use that information to
> skip row groups if the predicate allows us to.
>
> Row groups with column chunks that have at least one non-null value should
> get filtered correctly.
>
> Cheers, Lars
>
> On Thu, Oct 26, 2017 at 10:02 AM, Tim Armstrong wrote:
>
> Hi Bruno,
>
> Could you provide an example of the specific predicates that aren't being
>
> used to successfully skip the row group?
>
>
> - Tim
>
>
> On Thu, Oct 26, 2017 at 7:21 AM, Jeszy wrote:
>
>
> Hello Bruno,
>
>
> Thanks for bringing this up. While not apparent from the commit
>
> comments, this limitation was mentioned during the code review:
>
> 'min/max are only set when there are non-null values, so we don't
>
> consider statistics for "is null".' (see
>
> https://gerrit.cloudera.org/#/c/6147/).
>
> It looks to me that this was intended, but I'll let others confirm.
>
> Definitely a point where we can improve.
>
>
> Thanks!
>
>
> On 26 October 2017 at 08:02, Bruno Quinart wrote:
>
> > Hi all
>
> >
>
> > With IMPALA-2328, Parquet row group statistics are now being used to
>
> skip
>
> > the row group completely if the min/max range is excluded from the
>
> > predicate.
>
> > We have a use case in which we make sure the data is sorted on a 'key'
>
> and
>
> > have then many selective queries on that 'key' field. We notice a
>
> > significant performance increase.
>
> > So thanks a lot for all the work on that!
>
> >
>
> > One thing we notice is an unexpected behavior for records where that
>
> 'key'
>
> > has null values. It seems that as soon as null values are present in a
>
> row
>
> > group, the test on the min/max fails and the row group is read.
>
> >
>
> > We work with Impala 2.9. The data is put in parquet files by Impala
>
> itself.
>
> > We have noticed this effect for both bigint as decimal fields. Note that
>
> > it's difficult for me to extract the min/max statistics from the parquet
>
> > files. The parquet-tools included in our distribution (5.12) is not the
>
> > latest. And I was told PARQUET-327 would anyway not print the those row
>
> > group stats because of the way Impala stores them.
>
> > We do confirm the expected behavior (exactly one row group read for
>
> properly
>
> > sorted data) when we create a similar table but explicitly filter out
>
> all
>
> > null values for that 'key' field. We also notice that the the number of
>
> row
>
> > groups read (but zero records retained) is proportional to the number of
>
> > null values.
>
> >
>
> > Is this behavior expected?
>
> > Is there a fundamental reason those row groups can not be skipped?
>
> >
>
> > Thanks!
>
> > Bruno
>
> >
>
>
>
>
>


Re: Please hold off on merging new code changes again

2017-10-31 Thread Lars Volker
The two JIRAs breaking the exhaustive and ASAN builds (IMPALA-6123 and
IMPALA-6126) have been fixed and I think we can resume submitting changes
into master.

There's still a bunch of flaky JIRAs that we've seen in the past days. If
one of them is assigned to you, please keep having a look.

Thank you all for the help to get those issues resolved!

On Fri, Oct 27, 2017 at 7:31 PM, Lars Volker <l...@cloudera.com> wrote:

> After fixing IMPALA-6106 we've discovered IMPALA-6123, which breaks our
> exhaustive tests. In addition we've hit another new issue (IMPALA-6124) and
> 4 known issues we've seen in the previous week. The localfs build hit a
> stale snapshot, and ASAN hit a flaky issue that we thought was fixed, but
> seems to occur still (IMPALA-6004).
>
> I updated all relevant JIRAs with new information. If one of them is
> assigned to you, please have a look.
>
> Please hold off on merging new code changes unless they fix a flaky or
> broken test. I'll send an update once things look more stable.
>
> Cheers, Lars
>


Please hold off on merging new code changes again

2017-10-27 Thread Lars Volker
After fixing IMPALA-6106 we've discovered IMPALA-6123, which breaks our
exhaustive tests. In addition we've hit another new issue (IMPALA-6124) and
4 known issues we've seen in the previous week. The localfs build hit a
stale snapshot, and ASAN hit a flaky issue that we thought was fixed, but
seems to occur still (IMPALA-6004).

I updated all relevant JIRAs with new information. If one of them is
assigned to you, please have a look.

Please hold off on merging new code changes unless they fix a flaky or
broken test. I'll send an update once things look more stable.

Cheers, Lars


Re: Will there be a 2.12.0 release?

2017-10-24 Thread Lars Volker
I like Jim's idea of having a process to scope out 3.0.

Tim, can you think of features that are already lined up for 3.0 that we're
currently holding off on? If there are no pressing issues, I think it'd be
great to start working on scoping out 3.0 and produce 2.x releases until
then.

On Tue, Oct 24, 2017 at 10:19 AM, Jim Apple  wrote:

> Do we want to have a 3.0 process, where one person tracks all of the open
> breaking-change JIRAs and makes sure nothing gets accidentally left out? I
> ask this because, if the answer is "yes", we might make the 2.12 decision
> based on scope and quantity of 3.0 JIRAs.
>
> On Tue, Oct 24, 2017 at 10:07 AM, Tim Armstrong 
> wrote:
>
> > I was just retargeting some JIRAs from 2.11 to a later release. I'm
> > wondering if people had thoughts on whether we should have a 2.12 release
> > before 3.0?
> >
> > We have a lot of breaking changes queued up so I'm sure people are
> looking
> > forward to 3.0, but do we think there will be a minor release before
> then?
> >
>


Re: Using Gerrit drafts

2017-10-19 Thread Lars Volker
Note, that publishing cannot be undone. In particular, after you published
a change, subsequent pushes to refs/drafts will be public patch sets, too.

On Oct 19, 2017 09:34, "Philip Zeyliger"  wrote:

Hey folks,

This wasn't obvious for me, so I figured I'd share it. If you want to
review your Gerrit changes on the Gerrit UI before sending e-mail to the
community, you can run something like:

git push asf-gerrit HEAD:refs/drafts/master

This will give you a URL that you can browse to, and you can even run
https://jenkins.impala.io/view/Utility/job/pre-review-test/ against it. No
e-mails are sent!

Once you've looked it over, you can hit 'Publish' on the web UI, and, boom,
e-mails.

Cheers,

-- Philip


Re: [VOTE] Graduate to a TLP

2017-10-17 Thread Lars Volker
+1

On Oct 17, 2017 19:07, "Jim Apple"  wrote:

> Following our discussion
> https://lists.apache.org/thread.html/2f5db4788aff9b0557354b9106c032
> 8a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E
> , I propose that we graduate to a TLP. According to
> https://incubator.apache.org/guides/graduation.html#
> community_graduation_vote
> this is not required, and https://impala.apache.org/bylaws.html does not
> say whose votes are "binding" in a graduation vote, so all community
> members are welcome to vote.
>
> This will remain open 72 hours. I will be notifying general@incubator it
> is
> occurring.
>
> This is my +1.
>


Re: Time for graduation?

2017-10-15 Thread Lars Volker
I, too, think that Impala should graduate from the Incubator. In my eyes we
live by the standards and meet the expectations set by the ASF for its top
level projects.

On Sun, Oct 15, 2017 at 12:56 PM, Marcel Kornacker 
wrote:

> I am also in favor of graduation.
>
> I'd be happy to serve as the initial PMC chair.
>
> Marcel
>
> On Sun, Oct 15, 2017 at 11:52 AM, Daniel Hecht 
> wrote:
> > I'm in favor of graduation. We have come a long way and I agree it
> > feels like we are functioning like an Apache TLP.
> >
> > Dan
> >
> > On Fri, Oct 13, 2017 at 9:49 AM, Tim Armstrong 
> wrote:
> >> I'd be very happy if we could graduate too. We still obviously have to
> >> continue working on growing the community but we've made a huge amount
> of
> >> progress in setting up the infrastructure and processes to be
> successful as
> >> a top level project.
> >>
> >> I think having users like Brock on the PMC is great so that we can get
> >> input on whether the project is going in the right direction from their
> >> point of view.
> >>
> >> - Tim
> >>
> >> On Thu, Oct 12, 2017 at 4:46 PM, Brock Noland  wrote:
> >>
> >>> Hi all,
> >>>
> >>> I've been thinking about this as well and I feel Impala is ready.
> >>>
> >>> (more inline)
> >>>
> >>> On Thu, Oct 12, 2017 at 6:06 PM, Todd Lipcon 
> wrote:
> >>>
> >>> > On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple 
> wrote:
> >>> >
> >>> > > Also, mentors are traditionally included in a graduating podling's
> PMC,
> >>> > > right?
> >>> >
> >>> > That's often been done but I don't think there's any hard
> requirement.
> >>> > Perhaps we could ask each mentor whether they would like to continue
> to
> >>> be
> >>> > involved?
> >>> >
> >>>
> >>> For my part, I don't feel I contribute much to the PMC, but Impala is a
> >>> project I use everyday and thus have a strong interest in the project
> being
> >>> successful. I would not be hurt in the *least* if I was not included
> on the
> >>> PMC. However, I'd be more than happy to serve.
> >>>
> >>> Cheers,
> >>> Brock
> >>>
>


Rebase of all changes required

2017-10-13 Thread Lars Volker
Hi All,

Today we upgraded jenkins.impala.io to Ubuntu 16.04 and the latest Jenkins
version 2.73.2. This was necessary to address security issues raised in a
recent advisory by the Jenkins developers.

Before submitting new changes you will need to rebase them onto a branch
that has the fix for IMPALA-6045 (2187d36b
).
After that you should be good to go.

Cheers, Lars


ZSH Users: Request for Review

2017-09-15 Thread Lars Volker
Hi All,

I've recently pushed a change to fix "enable_distcc" when using ZSH. Now
I'm looking for someone using ZSH who can have a look:
https://gerrit.cloudera.org/#/c/8049/

Thanks, Lars


Re: vim / Eclipse setups for new developers, on the C++ side

2017-09-15 Thread Lars Volker
I use mosh + tmux + vim to develop on my desktop machine, and YouCompleteMe
for completion and navigation (jump to definition only). Recently I found
out that YCM also can jump to the declaration in Python. :)

To debug the Frontend, I checkout the matching revision on my laptop and
open it in Eclipse there. Then I forward the debug ports from my desktop
machine. That gives me a better experience than running Eclipse in a remote
X server.

On Fri, Sep 15, 2017 at 7:36 AM, Shant Hovsepian 
wrote:

> Pretty happy with using clang_complete in vim for C++ code in impala.
> Memory usage can get a bit high.
>
> clang_complete's quickfix feature is nice for showing errors, don't care
> for the periodic mode instead prefer to bind it and run it as part of a :w
>
> -Shant
>
> On Wed, Sep 13, 2017 at 8:34 PM, 俊杰陈  wrote:
>
> > I use NetBeans to view the code, the "show call graph" is useful to me.
> >
> > 2017-09-14 5:44 GMT+08:00 Tim Armstrong :
> >
> > > For a long time I've just used GNU screen + VIM with syntax
> highlighting.
> > > Then "git grep" or search in VIM as needed to find things. Obviously
> not
> > > ideal for everyone.
> > >
> > > I've tried YouCompleteMe recently and it works fairly well but hasn't
> > been
> > > a game-changer for me. Jumping to definitions is handy sometimes but I
> > > haven't found that it's changed my workflow that much.
> > >
> > > On Wed, Sep 13, 2017 at 2:18 PM, Philip Zeyliger 
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I'm querying what folks use for working on the C++ side of the code
> > base.
> > > > I'm specifically interested in navigation tools for vim (better than
> > > > ctags), error-highlighting tools for vim (showing syntax errors and
> > such
> > > > "live"), and Eclipse integration (yes, I've seen the wiki
> > > >  > > > Eclipse+Setup+for+Impala+Development>
> > > > ).
> > > >
> > > > I'll be happy to collate and update
> > > > https://cwiki.apache.org/confluence/display/IMPALA/
> > > > Useful+Tips+for+New+Impala+Developers
> > > > (or other appropriate pages) once I get some feedback!
> > > >
> > > > Thanks!
> > > >
> > > > -- Philip
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
>


Re: Build broken by b66af0357e - IMPALA-5854: Update external hadoop versions

2017-09-01 Thread Lars Volker
Yes, it looks like I had to remove $IMPALA_TOOLCHAIN/cdh_components. Thanks
Tim and Zach for the quick help.

On Fri, Sep 1, 2017 at 10:19 AM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> It works ok for me. Maybe you need to re-source impala-config.sh or
> re-bootstrap to pull down the new components?
>
> On Fri, Sep 1, 2017 at 10:17 AM, Lars Volker <l...@cloudera.com> wrote:
>
> > It looks like the recent hadoop version update broke the build for me. I
> > get this error, with the previous commit it still work. Anyone else
> seeing
> > this?
> >
> > be/src/common/hdfs.h:30:18: fatal error: hdfs.h: No such file or
> directory
> >  #include 
> >
>


Build broken by b66af0357e - IMPALA-5854: Update external hadoop versions

2017-09-01 Thread Lars Volker
It looks like the recent hadoop version update broke the build for me. I
get this error, with the previous commit it still work. Anyone else seeing
this?

be/src/common/hdfs.h:30:18: fatal error: hdfs.h: No such file or directory
 #include 


Re: percentile function in impala

2017-08-29 Thread Lars Volker
Impala currently doesn't have percentile functions. Work on those is
tracked in IMPALA-3602 and contributions are always welcome. You might be
able to get what you need using analytic functions, e.g. NTILE:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_analytic_functions.html#ntile

On Tue, Aug 29, 2017 at 8:39 AM, Gayathri Devi 
wrote:

> Hi,
>
> I have a table on hbase i want to query on impala to calculate 95 th and 99
> th percentile? any built in function available or user defined function?
>


Re: jenkins.impala.io maintenance

2017-08-08 Thread Lars Volker
Thank you for doing this, Michael!

On Mon, Aug 7, 2017 at 9:59 PM, Michael Ho  wrote:

> Jenkins plugins have been updated. Aborted GVD jobs were restarted.
>
> On Mon, Aug 7, 2017 at 5:25 PM, Michael Ho  wrote:
>
> > jenkins.impala.io need updates for some plugins to address a new
> security
> > advisory.
> >
> > It will be put into maintenance mode at 5:35pm PST so no new jobs can be
> > submitted. The upgrade will happen after all pending jobs complete. Once
> > the upgrade completes, another email will be sent out.
> >
> > Please speak up if you have any objection to the above.
> >
> > --
> > Thanks,
> > Michael
> >
>
>
>
> --
> Thanks,
> Michael
>


Re: Problem running Impala built with dynamic linking

2017-07-26 Thread Lars Volker
Vincent, did you build without ninja but with -so enabled? That still give
me the same error.

On Fri, Jul 21, 2017 at 12:01 PM, Vincent Tran  wrote:

> This happened to me last evening. I turned off ninja build and did a full
> build. Seems to have fixed it.
> Not sure if it's the exact same root cause here though (if you don't use
> ninja).
>
> On Fri, Jul 21, 2017 at 2:54 PM, Matthew Jacobs  wrote:
>
> > Did you ever come to a conclusion on this?
> >
> > On Wed, Jul 19, 2017 at 5:14 PM, Bikramjeet Vig
> >  wrote:
> > > Didn't work, heres the output of rm CMakeCache.txt && cmake .
> > >
> > >
> > > -- Setup toolchain link flags
> > > -Wl,-rpath,/home/bikram/dev/Impala/toolchain/gcc-4.9.2/lib64
> > > -L/home/bikram/dev/Impala/toolchain/gcc-4.9.2/lib64
> > > -- Build type is DEBUG
> > > -- ENABLE_CODE_COVERAGE: false
> > > -- Boost version: 1.57.0
> > > -- Found the following Boost libraries:
> > > --   thread
> > > --   regex
> > > --   filesystem
> > > --   system
> > > --   date_time
> > > -- Boost include dir:
> > > /home/bikram/dev/Impala/toolchain/boost-1.57.0-p3/include
> > > -- Boost libraries:
> > > /home/bikram/dev/Impala/toolchain/boost-1.57.0-p3/lib/
> > libboost_thread.a/home/bikram/dev/Impala/toolchain/boost-1.
> > 57.0-p3/lib/libboost_regex.a/home/bikram/dev/Impala/
> > toolchain/boost-1.57.0-p3/lib/libboost_filesystem.a/home/
> > bikram/dev/Impala/toolchain/boost-1.57.0-p3/lib/libboost_
> > system.a/home/bikram/dev/Impala/toolchain/boost-1.57.0-
> > p3/lib/libboost_date_time.a
> > > -- Found OpenSSL:
> > > /usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-
> > linux-gnu/libcrypto.so
> > > (found version "1.0.2g")
> > > -- --> Adding thirdparty library openssl_ssl. <--
> > > -- Header files: /usr/include
> > > -- Added shared library dependency openssl_ssl:
> > > /usr/lib/x86_64-linux-gnu/libssl.so
> > > -- --> Adding thirdparty library openssl_crypto. <--
> > > -- Added shared library dependency openssl_crypto:
> > > /usr/lib/x86_64-linux-gnu/libcrypto.so
> > > -- Bzip2: /home/bikram/dev/Impala/toolchain/bzip2-1.0.6-p2/include
> > > -- --> Adding thirdparty library bzip2. <--
> > > -- Header files: /home/bikram/dev/Impala/toolchain/bzip2-1.0.6-p2/
> > include
> > > -- Added static library dependency bzip2:
> > > /home/bikram/dev/Impala/toolchain/bzip2-1.0.6-p2/lib/libbz2.a
> > > -- Zlib: /home/bikram/dev/Impala/toolchain/zlib-1.2.8/include
> > > -- --> Adding thirdparty library zlib. <--
> > > -- Header files: /home/bikram/dev/Impala/toolchain/zlib-1.2.8/include
> > > -- Added static library dependency zlib:
> > > /home/bikram/dev/Impala/toolchain/zlib-1.2.8/lib/libz.a
> > > -- --> Adding thirdparty library hdfs. <--
> > > -- Header files:
> > > /home/bikram/dev/Impala/toolchain/cdh_components/
> hadoop-2.6.0-cdh5.13.0-
> > SNAPSHOT/include
> > > -- Added static library dependency hdfs:
> > > /home/bikram/dev/Impala/toolchain/cdh_components/
> hadoop-2.6.0-cdh5.13.0-
> > SNAPSHOT/lib/native/libhdfs.a
> > > -- --> Adding thirdparty library glog. <--
> > > -- Header files: /home/bikram/dev/Impala/toolchain/glog-0.3.4-p2/
> include
> > > -- Added static library dependency glog:
> > > /home/bikram/dev/Impala/toolchain/glog-0.3.4-p2/lib/libglog.a
> > > -- --> Adding thirdparty library gflags. <--
> > > -- Header files: /home/bikram/dev/Impala/toolchain/gflags-2.2.0-p1/
> > include
> > > -- Added static library dependency gflags:
> > > /home/bikram/dev/Impala/toolchain/gflags-2.2.0-p1/lib/libgflags.a
> > > -- --> Adding thirdparty library pprof. <--
> > > -- Header files: /home/bikram/dev/Impala/toolchain/gperftools-2.5/
> > include
> > > -- Added static library dependency pprof:
> > > /home/bikram/dev/Impala/toolchain/gperftools-2.5/lib/libprofiler.a
> > > -- --> Adding thirdparty library gtest. <--
> > > -- Header files: /home/bikram/dev/Impala/toolchain/gtest-1.6.0/include
> > > -- Added static library dependency gtest:
> > > /home/bikram/dev/Impala/toolchain/gtest-1.6.0/lib/libgtest.a
> > > -- LLVM llvm-config found at:
> > > /home/bikram/dev/Impala/toolchain/llvm-3.8.0-p1/bin/llvm-config
> > > -- LLVM clang++ found at:
> > > /home/bikram/dev/Impala/toolchain/llvm-3.8.0-p1/bin/clang++
> > > -- LLVM opt found at:
> > > /home/bikram/dev/Impala/toolchain/llvm-3.8.0-p1/bin/opt
> > > -- LLVM_ROOT: /home/bikram/dev/Impala/toolchain/llvm-3.8.0-asserts-p1
> > > -- LLVM llvm-config found at:
> > > /home/bikram/dev/Impala/toolchain/llvm-3.8.0-asserts-
> p1/bin/llvm-config
> > > -- LLVM include dir:
> > > /home/bikram/dev/Impala/toolchain/llvm-3.8.0-asserts-p1/include
> > > -- LLVM lib dir: /home/bikram/dev/Impala/toolchain/llvm-3.8.0-asserts-
> > p1/lib
> > > -- --> Adding thirdparty library cyrus_sasl. <--
> > > -- Header files: /usr/include
> > > -- Added shared library 

Re: jenkins.impala.io switching to SSL

2017-07-19 Thread Lars Volker
Hi All,

I completed the Jenkins reconfiguration that I announced last night.
Jenkins can now be reached at https://jenkins.impala.io and all previous
URLs redirect there permanently. From now on it will post https:// links in
code reviews. Links posted in old code reviews should still work. I found
two jobs that were aborted by the restart and I kicked off new build for
those. If one of your jobs got killed, please make sure to restart them,
too.

Unless we discover any issues with the new configuration there should be no
more interruptions. Thank you for your patience.

Cheers, Lars

On Tue, Jul 18, 2017 at 1:43 PM, Lars Volker <l...@cloudera.com> wrote:

> Hi All,
>
> Jenkins has been running with SSL for the past few days and I haven't
> received any complaints. If no-one objects, tomorrow morning (Wednesday,
> PST) I will configure http://jenkins.impala.io:8080/ to redirect to
> https://jenkins.impala.io. From that point on, Jenkins will also post
> links to its https endpoint in code reviews.
>
> Let me know if you have any questions or concerns.
>
> Cheers, Lars
>
> On Fri, Jul 14, 2017 at 10:55 PM, Lars Volker <l...@cloudera.com> wrote:
>
>> Hi All,
>>
>> our Jenkins instance now has a proper SSL certificate and can be reached
>> at https://jenkins.impala.io. The old redirect from http://j.i.o now
>> points to the SSL endpoint instead of port 8080.
>>
>> If you run into any issues with the SSL setup, please let me know. As a
>> workaround you can still access Jenkins directly at
>> http://jenkins.impala.io:8080/. If no-one reports any issues in the next
>> few days, I will eventually make that URL redirect to SSL, too, so all
>> connections will be secured.
>>
>> Cheers, Lars
>>
>
>


Re: jenkins.impala.io switching to SSL

2017-07-18 Thread Lars Volker
Hi All,

Jenkins has been running with SSL for the past few days and I haven't
received any complaints. If no-one objects, tomorrow morning (Wednesday,
PST) I will configure http://jenkins.impala.io:8080/ to redirect to
https://jenkins.impala.io. From that point on, Jenkins will also post links
to its https endpoint in code reviews.

Let me know if you have any questions or concerns.

Cheers, Lars

On Fri, Jul 14, 2017 at 10:55 PM, Lars Volker <l...@cloudera.com> wrote:

> Hi All,
>
> our Jenkins instance now has a proper SSL certificate and can be reached
> at https://jenkins.impala.io. The old redirect from http://j.i.o now
> points to the SSL endpoint instead of port 8080.
>
> If you run into any issues with the SSL setup, please let me know. As a
> workaround you can still access Jenkins directly at
> http://jenkins.impala.io:8080/. If no-one reports any issues in the next
> few days, I will eventually make that URL redirect to SSL, too, so all
> connections will be secured.
>
> Cheers, Lars
>


jenkins.impala.io switching to SSL

2017-07-14 Thread Lars Volker
Hi All,

our Jenkins instance now has a proper SSL certificate and can be reached at
https://jenkins.impala.io. The old redirect from http://j.i.o now points to
the SSL endpoint instead of port 8080.

If you run into any issues with the SSL setup, please let me know. As a
workaround you can still access Jenkins directly at
http://jenkins.impala.io:8080/. If no-one reports any issues in the next
few days, I will eventually make that URL redirect to SSL, too, so all
connections will be secured.

Cheers, Lars


Re: Add total number of started threads to /threadz

2017-07-10 Thread Lars Volker
Thank you for the feedback. I opened IMPALA-5643 to track this and pushed a
change here: https://gerrit.cloudera.org/#/c/7390/

On Mon, Jul 10, 2017 at 10:25 AM, Henry Robinson <he...@apache.org> wrote:

> This seems valuable to me.
>
> On 28 June 2017 at 21:26, Lars Volker <l...@cloudera.com> wrote:
>
> > Hi All,
> >
> > While investigating IMPALA-5598 I added a counter with the total number
> of
> > threads to /threadz. See below for what it looks like (I hope the ASF
> > mailer won't eat the format). Does this look helpful? If someone thinks
> it
> > does, I'll create a JIRA and push the change.
> >
> > Thanks, Lars
> >
> >
> > Thread GroupsAll threads
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?all
> >DataStreamSender
> > : (running: 0, total created: 2500)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-
> > group?group=DataStreamSender>common
> > : (running: 2, total created: 2)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=common>
> > coordinator-fragment-rpc
> > : (running: 12, total created: 12)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=coordinator-
> > fragment-rpc>disk-io-mgr
> > : (running: 34, total created: 34)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=disk-io-mgr>
> > fragment-mgr
> > : (running: 0, total created: 2550)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=fragment-mgr
> > >hdfs-scan-node
> > : (running: 0, total created: 2500)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-
> group?group=hdfs-scan-node
> > >hdfs-worker-pool
> > : (running: 16, total created: 16)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-
> > group?group=hdfs-worker-pool>impala-server
> > : (running: 8, total created: 8)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-
> group?group=impala-server>
> > plan-fragment-executor
> > : (running: 0, total created: 2550)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-
> group?group=plan-fragment-
> > executor>query-exec-state
> > : (running: 0, total created: 50)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-
> > group?group=query-exec-state>rpc-pool
> > : (running: 8, total created: 8)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=rpc-pool>
> > scheduling
> > : (running: 1, total created: 1)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=scheduling
> > >setup-server
> > : (running: 2, total created: 2)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=setup-server
> >
> > statestore-subscriber
> > : (running: 1, total created: 1)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-group?group=statestore-
> > subscriber>thrift-server
> > : (running: 248, total created: 248)
> > <http://lv-desktop.ca.cloudera.com:25000/thread-
> group?group=thrift-server>
> >
>


Re: Jenkins maintenance in 30 minutes

2017-07-10 Thread Lars Volker
Hi All,

Jenkins maintenance has been completed and the service should be working
again. However, some plugins issued a warning that parts of their
configuration format have changed and thus jobs may need to be
reconfigured. I started a canary job here that is currently still running:
http://jenkins.impala.io:8080/job/parallel-all-tests/1035/console

If you encounter any issues during job creation, please let me know.

Cheers, Lars

On Mon, Jul 10, 2017 at 11:32 AM, Lars Volker <l...@cloudera.com> wrote:

> Hi All,
>
> In about 30 minutes, jenkins.impala.io will become unavailable for
> maintenance. I expect it to take no more than 30 minutes and will send an
> additional email to this list once the service is back online. In case your
> jobs get terminated, please restart them.
>
> Thank you, Lars
>


Jenkins maintenance in 30 minutes

2017-07-10 Thread Lars Volker
Hi All,

In about 30 minutes, jenkins.impala.io will become unavailable for
maintenance. I expect it to take no more than 30 minutes and will send an
additional email to this list once the service is back online. In case your
jobs get terminated, please restart them.

Thank you, Lars


Add total number of started threads to /threadz

2017-06-28 Thread Lars Volker
Hi All,

While investigating IMPALA-5598 I added a counter with the total number of
threads to /threadz. See below for what it looks like (I hope the ASF
mailer won't eat the format). Does this look helpful? If someone thinks it
does, I'll create a JIRA and push the change.

Thanks, Lars


Thread GroupsAll threads
DataStreamSender
: (running: 0, total created: 2500)
common
: (running: 2, total created: 2)
coordinator-fragment-rpc
: (running: 12, total created: 12)
disk-io-mgr
: (running: 34, total created: 34)
fragment-mgr
: (running: 0, total created: 2550)
hdfs-scan-node
: (running: 0, total created: 2500)
hdfs-worker-pool
: (running: 16, total created: 16)
impala-server
: (running: 8, total created: 8)
plan-fragment-executor
: (running: 0, total created: 2550)
query-exec-state
: (running: 0, total created: 50)
rpc-pool
: (running: 8, total created: 8)
scheduling
: (running: 1, total created: 1)
setup-server
: (running: 2, total created: 2)
statestore-subscriber
: (running: 1, total created: 1)
thrift-server
: (running: 248, total created: 248)



Impalad fails to start after recent OS / Kernel upgrade

2017-06-27 Thread Lars Volker
Hi All,

A recent patch to the Linux kernel

made
changes to the stack guard gap and results in Impalad failing to start its
JVM. If you recently upgraded your OS kernel or have automatic updates
enabled you may find error messages similar to the following during startup:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x7f028e8e046f, pid=3044, tid=139649127274624


The effects on Impala are documented in IMPALA-5578
.

As a workaround you can pass a higher value of -Xss to the JVM by adding
this to your environment.

export JAVA_TOOL_OPTIONS="-Xss2m"


Alternatively you can pin the current kernel version to one before the
upgrade. Information on how to do so can be found here for apt-get
 and here for yum
.

Let's use the JIRA for further discussion.

Cheers, Lars


Re: Upcoming Jenkins maintenance at 7pm PST

2017-05-30 Thread Lars Volker
The update has been completed, Jenkins is running again. I tried to restart
all jobs that were running before installing the upgrades. If you notice
anything odd, please let me know.

On Tue, May 30, 2017 at 6:20 PM, Lars Volker <l...@cloudera.com> wrote:

> At 7pm PST jenkins.impala.io will go offline to install security updates.
> I will notify this list once the updates have been completed.
>
> Cheers, Lars
>


Upcoming Jenkins maintenance at 7pm PST

2017-05-30 Thread Lars Volker
At 7pm PST jenkins.impala.io will go offline to install security updates. I
will notify this list once the updates have been completed.

Cheers, Lars


Re: [DISCUSS] Release 2.9.0 soon?

2017-05-22 Thread Lars Volker
Impala has not release 2.9.0 yet. We follow the ASF release process which
is outlined here: http://www.apache.org/dev/release-publishing.html

The feedback to Taras' proposal seems positive, so he'll likely provide a
first release candidate soon.

For an overview you can search for JIRAs with fixVersion set to "Impala
2.9.0", but the list often has inaccuracies and will change until the final
release is published.

On Mon, May 22, 2017 at 11:51 AM, yu feng  wrote:

> where can I find the release notes about 2.9.0?  thanks.
>
> 2017-05-20 6:59 GMT+08:00 Tim Armstrong :
>
> > +1 Thanks for volunteering. It would be great to get a release done -
> it's
> > been quite a while since the last one.
> >
> > On Fri, May 19, 2017 at 5:53 PM, Bharath Vissapragada <
> > bhara...@cloudera.com
> > > wrote:
> >
> > > +1. Good to have a new release with all the latest improvements.
> > >
> > > On Fri, May 19, 2017 at 3:51 PM, Alexander Behm <
> alex.b...@cloudera.com>
> > > wrote:
> > >
> > > > +1 for doing a release
> > > >
> > > > On Fri, May 19, 2017 at 3:41 PM, Taras Bobrovytsky <
> > taras...@apache.org>
> > > > wrote:
> > > >
> > > > > This is not a [VOTE] thread. Everyone is encourage to participate.
> > > > >
> > > > > I am volunteering to be a release manager for Impala 2.9.0. Are
> there
> > > any
> > > > > objections to releasing 2.9.0 soon?
> > > > > Keep in mind this is NOT your last chance to speak - there will be
> at
> > > > least
> > > > > two votes, one for PPMC releasing and one for IPMC releasing.
> > > > >
> > > > > See
> > > > > https://cwiki.apache.org/confluence/display/IMPALA/
> > > > DRAFT%3A+How+to+Release
> > > > >
> > > >
> > >
> >
>


Re: Heads-up - manual toolchain update required soon

2017-05-14 Thread Lars Volker
I recently bumped the Breakpad version in the toolchain repo and now would
like to pull that into master. The change to do so is here:
https://gerrit.cloudera.org/#/c/6883/

Should I wait until gflags has been pulled into master and rebase? Or would
you like me to bump gflags in my change, too?

On Tue, May 9, 2017 at 12:21 AM, Henry Robinson  wrote:

> I'm about to start the process of getting IMPALA-5174 committed to the
> toolchain. This patch changes gflags to allow 'hidden' flags that won't
> show up on /varz etc.
>
> The toolchain glog has a dependency on gflags, meaning that the installed
> glog library needs to be built against the installed gflag library. So when
> the new gflag gets pulled in, you will need the new glog as well.
>
> However, the toolchain scripts won't detect that anything has changed for
> glog, because there's no version number change (changing the toolchain
> build ID doesn't cause the toolchain scripts to invalidate dependencies).
>
> Rather than introduce a spurious version bump with an empty patch file or
> something, I figured in this case it's easiest to ask developers to
> manually delete their local glog, and then bin/bootstrap_toolchain.py will
> download the most recent glog that's built against gflag. This is a
> one-time thing.
>
> I'll send out instructions about how to do this when the toolchain is
> updated.
>


Re: test_insert.py failing with ''File does not exist: ..."

2017-05-14 Thread Lars Volker
I reloaded my local cluster from a snapshot before seeing your mail and
that fixed it. I assume reloading the single table would have worked, too.
Thank you for your help!

On Sun, May 14, 2017 at 12:07 AM, Alexander Behm <alex.b...@cloudera.com>
wrote:

> Have you tried reloading the alltypesnopart_insert table?
>
> bin/load-data.py -f -w functional-query --table_names=alltypesnopart_
> insert
>
> You may have to run this first:
>
> bin/create_testdata.sh
>
>
> On Sat, May 13, 2017 at 2:44 PM, Lars Volker <l...@cloudera.com> wrote:
>
> > I cannot run test_insert.py anymore on master. I tried clean.sh, rebuilt
> > from scratch, removed the whole toolchain, but it still won't work. On
> > first glance it looks like the test setup code tries to drop the Hive
> > default partition but cannot find a file for it. Has anyone seen this
> error
> > before? Could this be related to the cdh_components update? Thanks, Lars
> >
> > -- executing against localhost:21000
> > select count(*) from alltypesnopart_insert;
> >
> > FAILED-- closing connection to: localhost:21000
> >
> > = short test
> > summary info =
> > FAIL
> > tests/query_test/test_insert.py::TestInsertQueries::()::
> > test_insert[exec_option:
> > {'batch_size': 0, 'num_nodes': 0, 'sync_ddl': 0, 'disable_codegen':
> False,
> > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> table_format:
> > text/none]
> >
> > 
> FAILURES
> > =
> >  TestInsertQueries.test_insert[exec_option: {'batch_size': 0,
> 'num_nodes':
> > 0, 'sync_ddl': 0, 'disable_codegen': False, 'abort_on_error': 1,
> > 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> > tests/query_test/test_insert.py:119: in test_insert
> > multiple_impalad=vector.get_value('exec_option')['sync_ddl'] == 1)
> > tests/common/impala_test_suite.py:332: in run_test_case
> > self.execute_test_case_setup(test_section['SETUP'],
> table_format_info)
> > tests/common/impala_test_suite.py:448: in execute_test_case_setup
> > self.__drop_partitions(db_name, table_name)
> > tests/common/impala_test_suite.py:596: in __drop_partitions
> > partition, True), 'Could not drop partition: %s' % partition
> > shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2513: in
> > drop_partition_by_name
> > return self.recv_drop_partition_by_name()
> > shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2541: in
> > recv_drop_partition_by_name
> > raise result.o2
> > E   MetaException: MetaException(_message='File does not exist:
> > /test-warehouse/functional.db/alltypesinsert/year=__HIVE_
> > DEFAULT_PARTITION__\n\tat
> > org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> > getContentSummary(FSDirectory.java:2296)\n\tat
> > org.apache.ha
> > doop.hdfs.server.namenode.FSNamesystem.getContentSummary(
> > FSNamesystem.java:4545)\n\tat
> > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> > getContentSummary(NameNodeRpcServer.java:1087)\n\tat
> > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderP
> > roxyClientProtocol.getContentSummary(AuthorizationProviderProxyClie
> > ntProtocol.java:563)\n\tat
> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi
> > deTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSi
> > deTranslatorPB.java:873)\n\tat
> > org.ap
> > ache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$
> > ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.
> > java)\n\tat
> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(
> > ProtobufRpcEngine.java:617)\n\tat
> > org.apa
> > che.hadoop.ipc.RPC$Server.call(RPC.java:1073)\n\tat
> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)\n\tat
> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)\n\tat
> > java.security.AccessController.doPrivileged(Native Method)\n\tat javax.s
> > ecurity.auth.Subject.doAs(Subject.java:415)\n\tat
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> > UserGroupInformation.java:1917)\n\tat
> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)\n')
> > ! Interrupted: stopping
> > after 1 failures !!
> > === 1 failed, 1 passed
> in
> > 22.29 seconds ===
> >
>


test_insert.py failing with ''File does not exist: ..."

2017-05-13 Thread Lars Volker
I cannot run test_insert.py anymore on master. I tried clean.sh, rebuilt
from scratch, removed the whole toolchain, but it still won't work. On
first glance it looks like the test setup code tries to drop the Hive
default partition but cannot find a file for it. Has anyone seen this error
before? Could this be related to the cdh_components update? Thanks, Lars

-- executing against localhost:21000
select count(*) from alltypesnopart_insert;

FAILED-- closing connection to: localhost:21000

= short test
summary info =
FAIL
tests/query_test/test_insert.py::TestInsertQueries::()::test_insert[exec_option:
{'batch_size': 0, 'num_nodes': 0, 'sync_ddl': 0, 'disable_codegen': False,
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
text/none]

 FAILURES
=
 TestInsertQueries.test_insert[exec_option: {'batch_size': 0, 'num_nodes':
0, 'sync_ddl': 0, 'disable_codegen': False, 'abort_on_error': 1,
'exec_single_node_rows_threshold': 0} | table_format: text/none]
tests/query_test/test_insert.py:119: in test_insert
multiple_impalad=vector.get_value('exec_option')['sync_ddl'] == 1)
tests/common/impala_test_suite.py:332: in run_test_case
self.execute_test_case_setup(test_section['SETUP'], table_format_info)
tests/common/impala_test_suite.py:448: in execute_test_case_setup
self.__drop_partitions(db_name, table_name)
tests/common/impala_test_suite.py:596: in __drop_partitions
partition, True), 'Could not drop partition: %s' % partition
shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2513: in
drop_partition_by_name
return self.recv_drop_partition_by_name()
shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2541: in
recv_drop_partition_by_name
raise result.o2
E   MetaException: MetaException(_message='File does not exist:
/test-warehouse/functional.db/alltypesinsert/year=__HIVE_DEFAULT_PARTITION__\n\tat
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2296)\n\tat
org.apache.ha
doop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4545)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1087)\n\tat
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderP
roxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:563)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:873)\n\tat
org.ap
ache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)\n\tat
org.apa
che.hadoop.ipc.RPC$Server.call(RPC.java:1073)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)\n\tat
java.security.AccessController.doPrivileged(Native Method)\n\tat javax.s
ecurity.auth.Subject.doAs(Subject.java:415)\n\tat
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)\n')
! Interrupted: stopping
after 1 failures !!
=== 1 failed, 1 passed in
22.29 seconds ===


Re: Should we change tests so they don't use single letter table names?

2017-05-12 Thread Lars Volker
Looking at AnalyzeDDLTest alone it's full of "t", "p", "tbl", "test",
"foo", "bar" and the like. Fixing them often means overflowing a line and
fixing line breaks, so seems a bit more effort. Maybe better to postpone
until after the release.

On Fri, May 12, 2017 at 6:11 PM, Alexander Behm <alex.b...@cloudera.com>
wrote:

> Tim, I think Michael was not suggesting to drop your tables, but instead
> create/drop new unique tables in each test like we do in EE tests.
>
> Yes, I think we should tackle this. I frequently run into this problem with
> a "foo" table :)
>
> On Fri, May 12, 2017 at 8:59 AM, Lars Volker <l...@cloudera.com> wrote:
>
> > Yes, they are in the default db. I think the easiest way to go about this
> > is to create 26 tables in default with a script and then rename tables in
> > the FE tests until we catch all of them. Or try to grep for the offending
> > tests. :)
> >
> > There seems to be some consensus that we should tackle this, so I'll
> open a
> > JIRA.
> >
> > On Fri, May 12, 2017 at 5:49 PM, Tim Armstrong <tarmstr...@cloudera.com>
> > wrote:
> >
> > > Personally I'd prefer the frontend test to fail instead of dropping my
> > > table without warning. I assume these tables are in the default
> database,
> > > right?
> > >
> > > On Fri, May 12, 2017 at 8:43 AM, Alexander Behm <
> alex.b...@cloudera.com>
> > > wrote:
> > >
> > > > Michael, to keep them fast and self-contained the FE tests do not
> > > require a
> > > > running Impala cluster, and as such cannot really execute any
> > statements
> > > > (e.g. DROP/ADD).
> > > >
> > > > The FE has limited mechanisms for setting up temporary tables which
> > might
> > > > suffice in most but not all cases.
> > > >
> > > > I agree with Lars that we should address this issue. We need to look
> > at a
> > > > few cases and see if there's a sledgehammer solution we can apply.
> > > >
> > > > On Fri, May 12, 2017 at 7:21 AM, Michael Brown <mi...@cloudera.com>
> > > wrote:
> > > >
> > > > > Why not alter the frontend test to drop t if exists? Tests should
> > > > generally
> > > > > strive to set themselves up. Is there some trait of the frontend
> > tests
> > > > that
> > > > > prevents that?
> > > > >
> > > > > On Fri, May 12, 2017 at 4:38 AM, Lars Volker <l...@cloudera.com>
> > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I frequently create test tables on my local system with names
> like
> > > "t"
> > > > or
> > > > > > "p". A couple of frontend tests use the same names and then fail
> > with
> > > > > > "Table already exists".
> > > > > >
> > > > > > Does anyone else hit this from time to time? Can we change the
> > table
> > > > > names
> > > > > > in the tests to avoid single letter names? If there are no
> > > objections,
> > > > > I'll
> > > > > > open a JIRA.
> > > > > >
> > > > > > Thanks, Lars
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Should we change tests so they don't use single letter table names?

2017-05-12 Thread Lars Volker
Yes, they are in the default db. I think the easiest way to go about this
is to create 26 tables in default with a script and then rename tables in
the FE tests until we catch all of them. Or try to grep for the offending
tests. :)

There seems to be some consensus that we should tackle this, so I'll open a
JIRA.

On Fri, May 12, 2017 at 5:49 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Personally I'd prefer the frontend test to fail instead of dropping my
> table without warning. I assume these tables are in the default database,
> right?
>
> On Fri, May 12, 2017 at 8:43 AM, Alexander Behm <alex.b...@cloudera.com>
> wrote:
>
> > Michael, to keep them fast and self-contained the FE tests do not
> require a
> > running Impala cluster, and as such cannot really execute any statements
> > (e.g. DROP/ADD).
> >
> > The FE has limited mechanisms for setting up temporary tables which might
> > suffice in most but not all cases.
> >
> > I agree with Lars that we should address this issue. We need to look at a
> > few cases and see if there's a sledgehammer solution we can apply.
> >
> > On Fri, May 12, 2017 at 7:21 AM, Michael Brown <mi...@cloudera.com>
> wrote:
> >
> > > Why not alter the frontend test to drop t if exists? Tests should
> > generally
> > > strive to set themselves up. Is there some trait of the frontend tests
> > that
> > > prevents that?
> > >
> > > On Fri, May 12, 2017 at 4:38 AM, Lars Volker <l...@cloudera.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I frequently create test tables on my local system with names like
> "t"
> > or
> > > > "p". A couple of frontend tests use the same names and then fail with
> > > > "Table already exists".
> > > >
> > > > Does anyone else hit this from time to time? Can we change the table
> > > names
> > > > in the tests to avoid single letter names? If there are no
> objections,
> > > I'll
> > > > open a JIRA.
> > > >
> > > > Thanks, Lars
> > > >
> > >
> >
>


Should we change tests so they don't use single letter table names?

2017-05-12 Thread Lars Volker
Hi All,

I frequently create test tables on my local system with names like "t" or
"p". A couple of frontend tests use the same names and then fail with
"Table already exists".

Does anyone else hit this from time to time? Can we change the table names
in the tests to avoid single letter names? If there are no objections, I'll
open a JIRA.

Thanks, Lars


Bookmarklet to find changes for a JIRA

2017-05-03 Thread Lars Volker
Hi All,

Sometimes I want to quickly see if there are any changes for a particular
JIRA. I found this bookmarklet to be helpful for that:

javascript:location.href='
https://gerrit.cloudera.org/#/q/message:'+document.location[
"pathname"].split('/')[3];

Here's a link to drag to your bookmark toolbar: Search Gerrit

Re: subscribe impala maillist

2017-04-12 Thread Lars Volker
You need to send an email to dev-subscr...@impala.apache.org.

Cheers, Lars

On Wed, Apr 12, 2017 at 5:07 AM, yu feng  wrote:

> subscribe
>


Re: Is there any way to retrieve table metadata using select rather than show?

2017-04-06 Thread Lars Volker
Adding support to expose the metadata as tables is tracked in IMPALA-1761
. We would welcome your
contribution to this.

On Thu, Apr 6, 2017 at 3:16 PM, Jeszy  wrote:

> Hey,
>
> that's not possible from within impala. If you go directly to the
> HMS's backing DB, you can query that.
> What information are you looking for?
>
> Thanks.
>
> On Thu, Apr 6, 2017 at 3:02 PM, 吴朱华  wrote:
> > Hi guys:
> >
> > Currently, we are using "show databases","show tables" or "Describe
> table"
> > to retrieve table metadata, but we use such as "select * from metadata"
> to
> > retrieve, just like RDBMS did^_^
>


Re: Hi and a few Impala design questions :)

2017-04-05 Thread Lars Volker
There is also this page, which has another paper published by the Impala
team, as well as other related materials:
https://cwiki.apache.org/confluence/display/IMPALA/Impala+Reading+List


On Wed, Apr 5, 2017 at 7:02 PM, Dimitris Tsirogiannis <
dtsirogian...@cloudera.com> wrote:

> Hi Antoni,
>
> Regarding question 2. The catalog server collects file metadata, including
> block locations from the HDFS NameNode and caches them in memory. Overtime,
> file metadata are broadcast using the statestore to all the Impala servers
> and stored in their local metadata caches.
>
> Dimitris
>
> On Tue, Apr 4, 2017 at 9:24 PM, Antoni Ivanov  wrote:
>
> > Hi,
> > I've been reading on design of catalog service/statestore.
> > Mostly from White paper about Impala - http://cidrdb.org/cidr2015/
> > Papers/CIDR15_Paper28.pdf
> > I got it from Impala confluence wiki https://cwiki.apache.org/
> > confluence/display/IMPALA/Impala+Presentations%2C+Papers+and+Blog+Posts
> > It’s rather interesting – it has fairly detailed (but clear) design of
> > different components
> >
> > Are there other sources (except the source code)?
> >
> > Question 2: I’ve been wondering does Impalad caches files location itself
> > – they don’t seem
> > to be stored in hive metastatore. Just the partition location is there,
> > right?
> >
> >
>


Best practice when closing issues?

2017-04-03 Thread Lars Volker
Do we have a best practice for the Status of finished work? Currently we
seem to use "Resolved" mostly, but sometime also "Closed". When should I
use which one?

Thanks, Lars

StatusIssuesPercentage
Open

1491

   29%
Reopened

44

   1%
Resolved

3550

   69%
Closed

45

   1%


Looking for a reviewer for scheduler benchmark

2017-04-01 Thread Lars Volker
This change adds a simple benchmark for the scheduler. It should aid in
identifying performance bottle necks (for example there's a heavily-used
string map) and to safeguard performance-sensitive changes. It has been
dormant for a while but I'd like to revive it and I'm looking for a
reviewer.

https://gerrit.cloudera.org/#/c/4554

Thanks, Lars


Re: Joining

2017-03-30 Thread Lars Volker
Hi Mahesh,

This list is the correct one. Please have a look at the our Contributing to
Impala
<https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala>
page
and let us know if you have any questions. Feel free to reach out, too, if
you need help looking for a good first task to work on.

Cheers, Lars

On Wed, Mar 29, 2017 at 11:12 PM, Mahesh Balija <balijamahesh@gmail.com>
wrote:

> Hi Lars,
>
> Yes, I didn't get this mail or may be I have missed it from Sailesh.
> Thanks for the confirmation. But I did not see any activity on impala
> forum. Not sure if I am in right list.
>
> Best,
> Mahesh.B.
>
> On Wed, Mar 29, 2017 at 10:05 PM, Lars Volker <l...@cloudera.com> wrote:
>
>> Hi Mahesh,
>>
>> Sailesh replied to your email a few weeks back. Did you by any chance
>> miss his reply? If you're not subscribed to the dev@ mailing list, then
>> you often won't get a copy of replies, so it'd be a good start to
>> subscribe. :)
>>
>> Here's Sailesh's email:
>>
>> Hi Mahesh,
>>>
>>> Thanks for you interest! You can get started by having a look at the
>>> Contribution guidelines:
>>> https://cwiki.apache.org/confluence/display/IMPALA/Contribut
>>> ing+to+Impala
>>>
>>> It walks you through how to get started on your first patch. Let us know
>>> if
>>> you have any questions.
>>>
>>> - Sailesh
>>
>>
>> Cheers, Lars
>>
>>
>>
>> On Wed, Mar 29, 2017 at 11:01 PM, Mahesh Balija <
>> balijamahesh@gmail.com> wrote:
>>
>>> Hi Team,
>>>
>>> I would like to join Impala dev community.
>>>
>>> Best,
>>> Mahesh.B.
>>>
>>
>>
>


Re: Joining

2017-03-29 Thread Lars Volker
Hi Mahesh,

Sailesh replied to your email a few weeks back. Did you by any chance miss
his reply? If you're not subscribed to the dev@ mailing list, then you
often won't get a copy of replies, so it'd be a good start to subscribe. :)

Here's Sailesh's email:

Hi Mahesh,
>
> Thanks for you interest! You can get started by having a look at the
> Contribution guidelines:
> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
>
> It walks you through how to get started on your first patch. Let us know if
> you have any questions.
>
> - Sailesh


Cheers, Lars



On Wed, Mar 29, 2017 at 11:01 PM, Mahesh Balija 
wrote:

> Hi Team,
>
> I would like to join Impala dev community.
>
> Best,
> Mahesh.B.
>


Re: Kudu version mismatch between impala-config.sh and toolchain

2017-03-29 Thread Lars Volker
Cool, thanks. My change has already been published to the S3 bucket, but I
hadn't updated the toolchain ID in the Impala repository. Could you push a
change to Impala once your publishing build is completed?

On Wed, Mar 29, 2017 at 7:21 PM, Matthew Jacobs <m...@cloudera.com> wrote:

> Lars, I'm starting a new toolchain build now with that change as well
> as your latest change:
>
> * 5364e21 - (upstream/master, gerrit/master) IMPALA-4226, IMPALA-4227:
> bump max threads, handle dwz compressed symbols (2 days ago)  Volker>
>
> On Wed, Mar 29, 2017 at 9:52 AM, Matthew Jacobs <m...@cloudera.com> wrote:
> > I'll get that version in to the toolchain repo:
> > https://gerrit.cloudera.org/#/c/6509/
> >
> > On Wed, Mar 29, 2017 at 5:52 AM, Lars Volker <l...@cloudera.com> wrote:
> >> Hi All,
> >>
> >> In impala-config.sh we consume Kudu version 16dd6e4.
> >>
> >> bin/impala-config.sh:124:export IMPALA_KUDU_VERSION=16dd6e4
> >>
> >> The toolchain does not build that version, but instead defaults to
> 2b0edbe.
> >> This breaks using a custom toolchain for me. Even the historical
> versions
> >> don't contain 16dd6e4.
> >>
> >> KUDU_VERSIONS="0.8.0-RC1 0.9.0-RC1 0.10.0-RC1 1.0.0-RC1 f2aeba 60aa54e
> >> a70c905006 e018a83 cd7b0dd"
> >>
> >> Where does 16dd6e4 come from? How can we fix this?
> >>
> >> Thanks, Lars
>


Re: Kudu errors when using a custom toolchain

2017-03-29 Thread Lars Volker
I spoke to soon, it seems to be a version mismatch in the toolchain. I
started a new thread on that problem.

On Tue, Mar 28, 2017 at 10:30 PM, Lars Volker <l...@cloudera.com> wrote:

> Cool, thanks! That (and re-sourcing the impala config) seems to have done
> the trick.
>
> On Tue, Mar 28, 2017 at 6:30 PM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
>> I think the distcc_env.sh script went rogues and deleted all the *cmake*
>> files under your toolchain. I'd suggest blowing away Kudu in the toolchain
>> and bootstrapping again.
>> We fixed clean.sh to not do that a while ago but I just noticed that the
>> distcc script wasn't fixed.
>>
>> On Tue, Mar 28, 2017 at 9:23 AM, Lars Volker <l...@cloudera.com> wrote:
>>
>> > I followed the steps outlined here
>> > <https://cwiki.apache.org/confluence/display/IMPALA/
>> > Building+native-toolchain+from+scratch+and+using+with+Impala>
>> > but when trying to build against my local toolchain I get the error
>> below.
>> > Does anyone know what I'm doing wrong? Thanks for the help, Lars.
>> >
>> > CMake Error at CMakeLists.txt:354 (find_package):
>> >   Could not find a package configuration file provided by "kuduClient"
>> with
>> >   any of the following names:
>> >
>> > kuduClientConfig.cmake
>> > kuduclient-config.cmake
>> >
>> >   Add the installation prefix of "kuduClient" to CMAKE_PREFIX_PATH or
>> set
>> >   "kuduClient_DIR" to a directory containing one of the above files.  If
>> >   "kuduClient" provides a separate development package or SDK, be sure
>> it
>> > has
>> >   been installed.
>> >
>> >
>> > -- Configuring incomplete, errors occurred!
>> > See also "/home/lv/i5/CMakeFiles/CMakeOutput.log".
>> > Error in /home/lv/i5/bin/make_impala.sh at line 160: cmake .
>> > ${CMAKE_ARGS[@]}
>> >
>>
>
>


Kudu version mismatch between impala-config.sh and toolchain

2017-03-29 Thread Lars Volker
Hi All,

In impala-config.sh we consume Kudu version 16dd6e4.

bin/impala-config.sh:124:export IMPALA_KUDU_VERSION=16dd6e4

The toolchain does not build that version, but instead defaults to 2b0edbe.
This breaks using a custom toolchain for me. Even the historical versions
don't contain 16dd6e4.

KUDU_VERSIONS="0.8.0-RC1 0.9.0-RC1 0.10.0-RC1 1.0.0-RC1 f2aeba 60aa54e
a70c905006 e018a83 cd7b0dd"

Where does 16dd6e4 come from? How can we fix this?

Thanks, Lars


Re: Kudu errors when using a custom toolchain

2017-03-28 Thread Lars Volker
Cool, thanks! That (and re-sourcing the impala config) seems to have done
the trick.

On Tue, Mar 28, 2017 at 6:30 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> I think the distcc_env.sh script went rogues and deleted all the *cmake*
> files under your toolchain. I'd suggest blowing away Kudu in the toolchain
> and bootstrapping again.
> We fixed clean.sh to not do that a while ago but I just noticed that the
> distcc script wasn't fixed.
>
> On Tue, Mar 28, 2017 at 9:23 AM, Lars Volker <l...@cloudera.com> wrote:
>
> > I followed the steps outlined here
> > <https://cwiki.apache.org/confluence/display/IMPALA/
> > Building+native-toolchain+from+scratch+and+using+with+Impala>
> > but when trying to build against my local toolchain I get the error
> below.
> > Does anyone know what I'm doing wrong? Thanks for the help, Lars.
> >
> > CMake Error at CMakeLists.txt:354 (find_package):
> >   Could not find a package configuration file provided by "kuduClient"
> with
> >   any of the following names:
> >
> > kuduClientConfig.cmake
> > kuduclient-config.cmake
> >
> >   Add the installation prefix of "kuduClient" to CMAKE_PREFIX_PATH or set
> >   "kuduClient_DIR" to a directory containing one of the above files.  If
> >   "kuduClient" provides a separate development package or SDK, be sure it
> > has
> >   been installed.
> >
> >
> > -- Configuring incomplete, errors occurred!
> > See also "/home/lv/i5/CMakeFiles/CMakeOutput.log".
> > Error in /home/lv/i5/bin/make_impala.sh at line 160: cmake .
> > ${CMAKE_ARGS[@]}
> >
>


Kudu errors when using a custom toolchain

2017-03-28 Thread Lars Volker
I followed the steps outlined here

but when trying to build against my local toolchain I get the error below.
Does anyone know what I'm doing wrong? Thanks for the help, Lars.

CMake Error at CMakeLists.txt:354 (find_package):
  Could not find a package configuration file provided by "kuduClient" with
  any of the following names:

kuduClientConfig.cmake
kuduclient-config.cmake

  Add the installation prefix of "kuduClient" to CMAKE_PREFIX_PATH or set
  "kuduClient_DIR" to a directory containing one of the above files.  If
  "kuduClient" provides a separate development package or SDK, be sure it
has
  been installed.


-- Configuring incomplete, errors occurred!
See also "/home/lv/i5/CMakeFiles/CMakeOutput.log".
Error in /home/lv/i5/bin/make_impala.sh at line 160: cmake .
${CMAKE_ARGS[@]}


Re: Min/Max runtime filtering on Impala-Kudu

2017-03-28 Thread Lars Volker
Thanks for sending this around, Sailesh.

parquet-column-stats.h is somewhat tied to parquet statistics and will soon
need more parquet-specific fields (null_count and distinct_count). It will
also need extension to copy strings and handle there memory when tracking
min/max values for variable length data across row batches. I'm not sure
how these changes will affect its suitability for your purposes, but I
thought I'd give a quick heads up.

On Mon, Mar 27, 2017 at 11:35 PM, Todd Lipcon  wrote:

> Sounds reasonable to me as well.
>
> The only thing I'd add is that, if it's easy to design the code to be
> extended to pushing small 'IN (...)' predicate for low-cardinality filters,
> that would be great. eg if the filter can start as an IN(...) and then if
> it exceeds 32K (or whatever arbitrary threshold), "collapse" it to the
> min/max range predicate?
>
> This should have a big advantage for partition pruning in low-cardinality
> joins against hash-partitioned tables.
>
> -Todd
>
>
>
> On Mon, Mar 27, 2017 at 2:29 PM, Matthew Jacobs  wrote:
>
> > Thanks for writing this up, Sailesh. It sounds reasonable.
> >
> > On Mon, Mar 27, 2017 at 2:24 PM, Sailesh Mukil 
> > wrote:
> > > On Mon, Mar 27, 2017 at 11:49 AM, Marcel Kornacker <
> mar...@cloudera.com>
> > > wrote:
> > >
> > >> On Mon, Mar 27, 2017 at 11:42 AM, Sailesh Mukil  >
> > >> wrote:
> > >> > I will be working on a patch to add min/max filter support in
> Impala,
> > and
> > >> > as a first step, specifically target the KuduScanNode, since the
> Kudu
> > >> > client is already able to accept a Min and a Max that it would
> > internally
> > >> > use to filter during its scans. Below is a brief design proposal.
> > >> >
> > >> > *Goal:*
> > >> >
> > >> > To leverage runtime min/max filter support in Kudu for the potential
> > >> speed
> > >> > up of queries over Kudu tables. Kudu does this by taking a min and a
> > max
> > >> > that Impala will provide and only return values in the range Impala
> is
> > >> > interested in.
> > >> >
> > >> > *[min <= range we're interested in >= max]*
> > >> >
> > >> > *Proposal:*
> > >> >
> > >> >
> > >> >- As a first step, plumb the runtime filter code from
> > >> > *exec/hdfs-scan-node-base.cc/h
> > >> >* to *exec/scan-node.cc/h
> > >> >*, so that it can be applied to
> > *KuduScanNode*
> > >> >cleanly as well, since *KuduScanNode* and *HdfsScanNodeBase* both
> > >> >inherit from *ScanNode.*
> > >>
> > >> Quick comment: please make sure your solution also applies to
> > >> KuduScanNodeMt.
> > >>
> > >
> > > Thanks for the input, I'll make sure to do that.
> > >
> > >
> > >>
> > >> >- Reuse the *ColumnStats* class (exec/parquet-column-stats.h) or
> > >> >implement a lighter weight version of it to process and store the
> > Min
> > >> and
> > >> >the Max on the build side of the join.
> > >> >- Once the Min and Max values are added to the existing runtime
> > filter
> > >> >structures, as a first step, we will ignore the Min and Max
> values
> > for
> > >> >non-Kudu tables. Using them for non-Kudu tables can come in as a
> > >> following
> > >> >patch(es).
> > >> >- Similarly, the bloom filter will be ignored for Kudu tables,
> and
> > >> only
> > >> >the Min and Max values will be used, since Kudu does not accept
> > bloom
> > >> >filters yet. (https://issues.apache.org/jira/browse/IMPALA-3741)
> > >> >- Applying the bloom filter on the Impala side of the Kudu scan
> > (i.e.
> > >> in
> > >> >KuduScanNode) is not in the scope of this patch.
> > >> >
> > >> >
> > >> > *Complications:*
> > >> >
> > >> >- We have to make sure that finding the Min and Max values on the
> > >> build
> > >> >side doesn't regress certain workloads, since the difference
> > between
> > >> >generating a bloom filter and generating a Min and a Max, is
> that a
> > >> bloom
> > >> >filter can be type agnostic (we just take a raw hash over the
> data)
> > >> whereas
> > >> >a Min and a Max have to be type specific.
> > >>
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Target Version/s vs Target Version

2017-03-24 Thread Lars Volker
The fields have been removed and only "Target Version" shows up in the
"Create" dialog. However we're the only project using this field and infra
asked in the JIRA whether we'd be willing to switch to using "Target
Version/s":

However I'd like to ask - we have a 'Target Verison/s' already that 407
> projects use, you are the only project using the 'Target Version' - would
> you consider migrating to the most used one so we have only one such field
> in our Jira instance?
>

I don't feel strongly about this. What are your thoughts?

On Thu, Mar 23, 2017 at 6:54 PM, Lars Volker <l...@cloudera.com> wrote:

> Please have a look at the reply on the JIRA. It looks like it'd be a
> considerable amount of work to change this and I'm not sure it's worth it.
>
> On Thu, Mar 23, 2017 at 6:22 PM, Lars Volker <l...@cloudera.com> wrote:
>
>> I created INFRA-13738 <https://issues.apache.org/jira/browse/INFRA-13738> to
>> track this.
>>
>> On Thu, Mar 23, 2017 at 4:55 PM, Jim Apple <jbap...@cloudera.com> wrote:
>>
>>> > I don't know if there's a way to disable a custom field for a project.
>>> > Bharath, Jim, do you know?
>>>
>>> I'd suggest filing an INFRA ticket. I looked at
>>> https://issues.apache.org/jira/plugins/servlet/project-confi
>>> g/IMPALA/fields
>>> and I don't think I have edit rights.
>>>
>>
>>
>


Re: Target Version/s vs Target Version

2017-03-23 Thread Lars Volker
Please have a look at the reply on the JIRA. It looks like it'd be a
considerable amount of work to change this and I'm not sure it's worth it.

On Thu, Mar 23, 2017 at 6:22 PM, Lars Volker <l...@cloudera.com> wrote:

> I created INFRA-13738 <https://issues.apache.org/jira/browse/INFRA-13738> to
> track this.
>
> On Thu, Mar 23, 2017 at 4:55 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> > I don't know if there's a way to disable a custom field for a project.
>> > Bharath, Jim, do you know?
>>
>> I'd suggest filing an INFRA ticket. I looked at
>> https://issues.apache.org/jira/plugins/servlet/project-confi
>> g/IMPALA/fields
>> and I don't think I have edit rights.
>>
>
>


Re: Target Version/s vs Target Version

2017-03-23 Thread Lars Volker
I created INFRA-13738  to
track this.

On Thu, Mar 23, 2017 at 4:55 PM, Jim Apple  wrote:

> > I don't know if there's a way to disable a custom field for a project.
> > Bharath, Jim, do you know?
>
> I'd suggest filing an INFRA ticket. I looked at
> https://issues.apache.org/jira/plugins/servlet/project-
> config/IMPALA/fields
> and I don't think I have edit rights.
>


Target Version/s vs Target Version

2017-03-23 Thread Lars Volker
I noticed that new JIRAs can have values for both "Target Version/s"
(plural) and "Target Version" (singular). Is there a preference for
populating one over the other? Can we get rid of one?

The plural one only has 13 issues using it: Relevant JIRA Search


I manually fixed the only two issues that had the plural set and an empty
singular field. Is there a way to disable the "Target Version/s" field for
Impala? There seem to be two, the 13 issues are using the one ending in
820.

[image: Inline image 1]

Cheers, Lars


Re: GVO request

2017-02-28 Thread Lars Volker
Started a job, saw Jim's job afterwards. Will cancel mine.

On Tue, Feb 28, 2017 at 3:53 PM, Bharath Vissapragada  wrote:

> Can someone GVO this please.
>
> https://gerrit.cloudera.org/#/c/5792/
>


Re: Toolchain - versioning dependencies with the same version number

2017-02-28 Thread Lars Volker
Can we add another version string component like -1 or -impala1, or add a
dummy patch to the affected packages to allow for new versions with the
same upstream version? I think this is what Linux distributions commonly do
to have several versions of the same upstream version.

On Feb 27, 2017 21:15, "Henry Robinson"  wrote:

Yes, it would force re-downloading. At my office, downloading a toolchain
takes a matter of a few seconds, so I'm not sure the cost is that great.
And if it turned out to be problematic, one could always change the
toolchain directory for different branches. Having something locally that
set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ would
work.

However I wouldn't want to force behaviour that into the toolchain scripts
because of the need for garbage collection it would raise - it wouldn't be
clear when to delete old toolchains programatically.

On 27 February 2017 at 20:51, Tim Armstrong  wrote:

> Maybe I'm misunderstanding, but wouldn't that force re-downloading of the
> entire toolchain every time a developer switches between branches with
> different build IDs?
>
> I know some developers do that frequently, e.g. to try and reproduce bugs
> on older versions or backport patches.
>
> I agree it would be good to fix this, since I've run into this problem
> before, I'm just not quite sure what the best solution is. In the other
> case where I had this issue with LLVM I changed the version number (by
> appending noasserts-) to it, but that's really just a hack.
>
> -Tim
>
> On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson 
> wrote:
>
> > As Matt said, I have a patch that implements build ID-based versioning
at
> > https://gerrit.cloudera.org/#/c/6166/2.
> >
> > Does anyone want to take a look? If we could get this in soon it would
> help
> > smooth over the LZ4 change which is going in shortly.
> >
> > On 27 February 2017 at 14:21, Henry Robinson  wrote:
> >
> > > I agree that that might be useful, and that it's a separately
> addressable
> > > problem.
> > >
> > > On 27 February 2017 at 14:18, Matthew Jacobs  wrote:
> > >
> > >> Just catching up to this e-mail, though I had seen your code reviews
> > >> and I think this approach makes sense. An additional concern would be
> > >> how to identify how a toolchain package was built, and AFAIK this is
> > >> tricky now if only the 'toolchain ID' is known. Before I saw this
> > >> e-mail I was thinking about this problem (which I think we can
address
> > >> separately), and that we might want to write the native-toolchain git
> > >> hash with every toolchain build so that the exact build scripts are
> > >> associated with those build artifacts. I filed
> > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> > >> problem.
> > >>
> > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson 
> > >> wrote:
> > >> > As written, the toolchain can't apparently deal with the
possibility
> > of
> > >> > build flags changing, but a dependency version remaining the same.
> > >> >
> > >> > LZ4 has never (afaict) been built with optimization enabled. I have
> a
> > >> > commit that enables -O3, but that continues to produce artifacts
for
> > >> > lz4-1.7.5 with no version change. This is a problem because
> > >> bootstrapping
> > >> > the toolchain will fail to pick up the new binaries - because the
> > >> > previously downloaded version is still in the local cache, and
won't
> > be
> > >> > overwritten because of the version change.
> > >> >
> > >> > I think the simplest way to fix this is to write the toolchain
build
> > ID
> > >> to
> > >> > the dependency version file (that's in the local cache only) when
> it's
> > >> > downloaded. If that ID changes, the dependency will be
> re-downloaded.
> > >> >
> > >> > This has the disadvantage that any bump in
IMPALA_TOOLCHAIN_BUILD_ID
> > >> will
> > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> > >> > re-download all of them. My feeling is that that cost is better
than
> > >> trying
> > >> > to individually determine whether a dependency has changed between
> > >> > toolchain builds.
> > >> >
> > >> > Any thoughts on whether this is the right way to go?
> > >> >
> > >> > Henry
> > >>
> > >
> > >
> > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679 <(415)%20994-6679>
> > >
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679
> >
>



--
Henry Robinson
Software Engineer
Cloudera
415-994-6679 <(415)%20994-6679>


Re: jenkins.impala.io password reset request, thanks

2017-02-24 Thread Lars Volker
Having a look

On Fri, Feb 24, 2017 at 10:54 AM, Michael Brown  wrote:

>
>


Re: status-benchmark.cc compilation time

2017-02-21 Thread Lars Volker
I think -notests already skips the benchmarks. However, I understood that
the proposition is to even disable building the benchmarks without
-notests, i.e. they'll be disabled by default and you'd need to specify
-build_benchmarks to build them.

I'm in favor of doing that, including building them in exhaustive runs.

On Tue, Feb 21, 2017 at 10:50 AM, Alex Behm  wrote:

> +1 for not compiling the benchmarks in -notests
>
> On Mon, Feb 20, 2017 at 7:55 PM, Jim Apple  wrote:
>
> > > On which note, would anyone object if we disabled benchmark compilation
> > by
> > > default when building the BE tests? I mean separating out -notests into
> > > -notests and -build_benchmarks (the latter false by default).
> >
> > I think this is a great idea.
> >
> > > I don't mind if the benchmarks bitrot as a result, because we don't run
> > > them regularly or pay attention to their output except when developing
> a
> > > feature. Of course, maybe an 'exhaustive' run should build the
> benchmarks
> > > as well just to keep us honest, but I'd be happy if 95% of Jenkins
> builds
> > > didn't bother.
> >
> > The pre-merge (aka GVM aka GVO) testing builds
> > http://jenkins.impala.io:8080/job/all-build-options, which builds
> > without the "-notests" flag.
> >
>


Unify Jira Labels for "Documentation"

2017-02-20 Thread Lars Volker
In addition to the Documentation component, we currently have four
different labels for documentation related Jiras: doc, doc-production,
docs, and documentation.

Can we agree on only one of them and remove the others? Are there different
meanings to them that I'm not aware of?

Thanks, Lars


Re: If FE tests hang on your machine, try restarting the minicluster

2017-02-17 Thread Lars Volker
After restarting the minicluster I reran the tests and that seems to have
overwritten the logfiles. :(

On Feb 17, 2017 12:24, "Todd Lipcon" <t...@cloudera.com> wrote:

> Hey Lars,
>
> Do you have any FE logs from the test that hung? Maybe there's something in
> there that can help us figure out why the metadata loading hung -- at the
> worst it should timeout, not hang forever.
>
> -Todd
>
> On Fri, Feb 17, 2017 at 11:56 AM, Lars Volker <l...@cloudera.com> wrote:
>
> > Hi All,
> >
> > The frontend tests seemed to hang on my local dev machine. Running jstack
> > on the child of a hanging test process gave the attached stacks.
> >
> > From the stacks it looked like Kudu was implicated in the hang, and a
> > restart of the local minicluster fixed the issue. Hope it helps someone.
> :)
> >
> > Cheers, Lars
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


If FE tests hang on your machine, try restarting the minicluster

2017-02-17 Thread Lars Volker
Hi All,

The frontend tests seemed to hang on my local dev machine. Running jstack
on the child of a hanging test process gave the attached stacks.

>From the stacks it looked like Kudu was implicated in the hang, and a
restart of the local minicluster fixed the issue. Hope it helps someone. :)

Cheers, Lars
2017-02-17 11:48:21
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode):

"Hashed wheel timer #7" daemon prio=10 tid=0x7f8964051000 nid=0x2715 
sleeping[0x7f884003f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.kudu.client.shaded.org.jboss.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
at 
org.apache.kudu.client.shaded.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
at 
org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None

"New I/O boss #53" daemon prio=10 tid=0x7f8964050800 nid=0x2714 runnable 
[0x7f8840342000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x0007a98050c8> (a sun.nio.ch.Util$2)
- locked <0x0007a98050b8> (a java.util.Collections$UnmodifiableSet)
- locked <0x0007a9804fa0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at 
org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- <0x0007a98153d0> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)

"New I/O worker #52" daemon prio=10 tid=0x7f896404f800 nid=0x2713 runnable 
[0x7f8840443000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x0007a96d5648> (a sun.nio.ch.Util$2)
- locked <0x0007a96d5638> (a java.util.Collections$UnmodifiableSet)
- locked <0x0007a96d5520> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at 
org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- <0x0007a98046f8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)

"New I/O worker #51" daemon prio=10 tid=0x7f896404e800 nid=0x2712 runnable 
[0x7f884014]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native 

Re: Can a Committer Please Carry the +2 and Submit this Change for me?

2017-02-02 Thread Lars Volker
Thank you!

On Thu, Feb 2, 2017 at 2:22 AM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Will do
>
> On Wed, Feb 1, 2017 at 5:20 PM, Lars Volker <l...@cloudera.com> wrote:
>
> > https://gerrit.cloudera.org/#/c/5611/12
> >
> > Patch Set 11 has a +2 from Marcel. In Patch Set 12 I rebased the change,
> > replaced NULL with nullptr in one .cc file, and removed "// clang-format"
> > control statements and TODOs.
> >
> > Thanks for the help, Lars
> >
>


Can a Committer Please Carry the +2 and Submit this Change for me?

2017-02-01 Thread Lars Volker
https://gerrit.cloudera.org/#/c/5611/12

Patch Set 11 has a +2 from Marcel. In Patch Set 12 I rebased the change,
replaced NULL with nullptr in one .cc file, and removed "// clang-format"
control statements and TODOs.

Thanks for the help, Lars


Re: Python string formatting

2017-02-01 Thread Lars Volker
Thanks for the feedback. Printing all tests and making sure the names stay
the same is a very good idea.

I also just stumbled over yapf (https://github.com/google/yapf) which is a
python code formatter based on clang-format's logic. I will check that one
out too and try to come up with a patch for review when I have some spare
time.

On Wed, Feb 1, 2017 at 3:33 AM, Jim Apple <jbap...@cloudera.com> wrote:

> 1. Autopep8 might break a part of a test that is rarely run, so even
> if exhaustive works, something may have gotten messed up.
>
> 2. David pointed out that you can print the tests that are scheduled
> to be run by using pytest's dry-run option. You can then check that
> the number of tests (and maybe even the configuration and names of
> them?) stay the same after autopep8/
>
> 3. You could use https://docs.python.org/2/library/ast.html to make
> sure that only the whitespace is being changed and the AST is staying
> the same. That might be overkill.
>
> On Tue, Jan 31, 2017 at 5:53 PM, Lars Volker <l...@cloudera.com> wrote:
> > Thanks for pointing this out. Yes, I thought of trying autopep8 and
> running
> > an exhaustive build. I assumed it wouldn't break tests, but that is
> > probably a naive assumption. However, if nothing breaks in an exhaustive
> > run I'd be quite confident that nothing else will. That leaves the risk
> > that a change made by autopep8 disables a test so we don't notice the
> > breakage. Other than manually reviewing the code I don't have an idea how
> > to prevent that.
> >
> > Thoughts?
> >
> > On Tue, Jan 31, 2017 at 8:29 PM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> >> Will you use autopep8? If so, how will you check that it doesn't break
> >> something on an infrequently-used codepath?
> >>
> >> On Tue, Jan 31, 2017 at 11:12 AM, Michael Brown <mi...@cloudera.com>
> >> wrote:
> >> > I agree.
> >> >
> >> > On Tue, Jan 31, 2017 at 10:44 AM, Lars Volker <l...@cloudera.com>
> wrote:
> >> >> Thanks for the feedback here and in the review.
> >> >>
> >> >> I agree that we should aim for a style as close to PEP8 as possible.
> >> >> However, I also didn't want to overshoot and my first goal was to get
> >> some
> >> >> useful tooling set up, so that I don't have to constantly worry about
> >> >> formatting. Once I had figured out some tooling, I thought I might as
> >> well
> >> >> share it and solicit feedback.
> >> >>
> >> >> Regarding the next steps, I'm open for anything really. I didn't know
> >> about
> >> >> the --diff switch of flake8, that looks very useful. Even better of
> >> course
> >> >> would be, if all python code could be converted to PEP8.
> >> >>
> >> >> Here is a list of all PEP8 violations and their count, obtained with
> >> >> "pycodestyle --statistics -qq tests":
> >> >>
> >> >> 9017E111 indentation is not a multiple of four
> >> >> 902 E114 indentation is not a multiple of four (comment)
> >> >> 2   E116 unexpected indentation (comment)
> >> >> 24  E122 continuation line missing indentation or outdented
> >> >> 5   E124 closing bracket does not match visual indentation
> >> >> 105 E125 continuation line with same indent as next logical line
> >> >> 43  E127 continuation line over-indented for visual indent
> >> >> 1038E128 continuation line under-indented for visual indent
> >> >> 7   E131 continuation line unaligned for hanging indent
> >> >> 13  E201 whitespace after '('
> >> >> 8   E202 whitespace before ']'
> >> >> 55  E203 whitespace before ':'
> >> >> 5   E211 whitespace before '['
> >> >> 5   E221 multiple spaces before operator
> >> >> 7   E222 multiple spaces after operator
> >> >> 9   E225 missing whitespace around operator
> >> >> 1   E227 missing whitespace around bitwise or shift operator
> >> >> 127 E231 missing whitespace after ':'
> >> >> 157 E251 unexpected spaces around keyword / parameter equals
> >> >> 20  E261 at least two spaces before inline comment
> >> >> 21  E265 block comment should start with '# '
> >> >> 1   E266 too many leading '#' for block comment
> >> >> 1   E271 multiple spaces after keyword

Re: Python string formatting

2017-01-31 Thread Lars Volker
Thanks for pointing this out. Yes, I thought of trying autopep8 and running
an exhaustive build. I assumed it wouldn't break tests, but that is
probably a naive assumption. However, if nothing breaks in an exhaustive
run I'd be quite confident that nothing else will. That leaves the risk
that a change made by autopep8 disables a test so we don't notice the
breakage. Other than manually reviewing the code I don't have an idea how
to prevent that.

Thoughts?

On Tue, Jan 31, 2017 at 8:29 PM, Jim Apple <jbap...@cloudera.com> wrote:

> Will you use autopep8? If so, how will you check that it doesn't break
> something on an infrequently-used codepath?
>
> On Tue, Jan 31, 2017 at 11:12 AM, Michael Brown <mi...@cloudera.com>
> wrote:
> > I agree.
> >
> > On Tue, Jan 31, 2017 at 10:44 AM, Lars Volker <l...@cloudera.com> wrote:
> >> Thanks for the feedback here and in the review.
> >>
> >> I agree that we should aim for a style as close to PEP8 as possible.
> >> However, I also didn't want to overshoot and my first goal was to get
> some
> >> useful tooling set up, so that I don't have to constantly worry about
> >> formatting. Once I had figured out some tooling, I thought I might as
> well
> >> share it and solicit feedback.
> >>
> >> Regarding the next steps, I'm open for anything really. I didn't know
> about
> >> the --diff switch of flake8, that looks very useful. Even better of
> course
> >> would be, if all python code could be converted to PEP8.
> >>
> >> Here is a list of all PEP8 violations and their count, obtained with
> >> "pycodestyle --statistics -qq tests":
> >>
> >> 9017E111 indentation is not a multiple of four
> >> 902 E114 indentation is not a multiple of four (comment)
> >> 2   E116 unexpected indentation (comment)
> >> 24  E122 continuation line missing indentation or outdented
> >> 5   E124 closing bracket does not match visual indentation
> >> 105 E125 continuation line with same indent as next logical line
> >> 43  E127 continuation line over-indented for visual indent
> >> 1038E128 continuation line under-indented for visual indent
> >> 7   E131 continuation line unaligned for hanging indent
> >> 13  E201 whitespace after '('
> >> 8   E202 whitespace before ']'
> >> 55  E203 whitespace before ':'
> >> 5   E211 whitespace before '['
> >> 5   E221 multiple spaces before operator
> >> 7   E222 multiple spaces after operator
> >> 9   E225 missing whitespace around operator
> >> 1   E227 missing whitespace around bitwise or shift operator
> >> 127 E231 missing whitespace after ':'
> >> 157 E251 unexpected spaces around keyword / parameter equals
> >> 20  E261 at least two spaces before inline comment
> >> 21  E265 block comment should start with '# '
> >> 1   E266 too many leading '#' for block comment
> >> 1   E271 multiple spaces after keyword
> >> 4   E301 expected 1 blank line, found 0
> >> 313 E302 expected 2 blank lines, found 1
> >> 16  E303 too many blank lines (2)
> >> 13  E305 expected 2 blank lines after class or function definition,
> >> found 1
> >> 6   E306 expected 1 blank line before a nested definition, found 0
> >> 7   E402 module level import not at top of file
> >> 3800E501 line too long (80 > 79 characters)
> >> 278 E502 the backslash is redundant between brackets
> >> 87  E701 multiple statements on one line (colon)
> >> 74  E703 statement ends with a semicolon
> >> 12  E711 comparison to None should be 'if cond is None:'
> >> 9   E712 comparison to False should be 'if cond is False:' or 'if
> not
> >> cond:'
> >> 2   E713 test for membership should be 'not in'
> >> 2   E741 ambiguous variable name 'l'
> >> 1   W292 no newline at end of file
> >> 9   W391 blank line at end of file
> >> 2   W601 .has_key() is deprecated, use 'in'
> >> 19  W602 deprecated form of raising exception
> >>
> >> If we take out the well known ones (indent, line width), it does not
> look
> >> too far fetched to me to change it all to PEP8.
> >>
> >> Thoughts?
> >>
> >>
> >>
> >> On Tue, Jan 31, 2017 at 5:59 PM, Michael Brown <mi...@cloudera.com>
> wrote:
> >>
> >>> Thanks. I made some comments on the review, but I see now 

Re: Python string formatting

2017-01-31 Thread Lars Volker
Thanks for the feedback here and in the review.

I agree that we should aim for a style as close to PEP8 as possible.
However, I also didn't want to overshoot and my first goal was to get some
useful tooling set up, so that I don't have to constantly worry about
formatting. Once I had figured out some tooling, I thought I might as well
share it and solicit feedback.

Regarding the next steps, I'm open for anything really. I didn't know about
the --diff switch of flake8, that looks very useful. Even better of course
would be, if all python code could be converted to PEP8.

Here is a list of all PEP8 violations and their count, obtained with
"pycodestyle --statistics -qq tests":

9017E111 indentation is not a multiple of four
902 E114 indentation is not a multiple of four (comment)
2   E116 unexpected indentation (comment)
24  E122 continuation line missing indentation or outdented
5   E124 closing bracket does not match visual indentation
105 E125 continuation line with same indent as next logical line
43  E127 continuation line over-indented for visual indent
1038E128 continuation line under-indented for visual indent
7   E131 continuation line unaligned for hanging indent
13  E201 whitespace after '('
8   E202 whitespace before ']'
55  E203 whitespace before ':'
5   E211 whitespace before '['
5   E221 multiple spaces before operator
7   E222 multiple spaces after operator
9   E225 missing whitespace around operator
1   E227 missing whitespace around bitwise or shift operator
127 E231 missing whitespace after ':'
157 E251 unexpected spaces around keyword / parameter equals
20  E261 at least two spaces before inline comment
21  E265 block comment should start with '# '
1   E266 too many leading '#' for block comment
1   E271 multiple spaces after keyword
4   E301 expected 1 blank line, found 0
313 E302 expected 2 blank lines, found 1
16  E303 too many blank lines (2)
13  E305 expected 2 blank lines after class or function definition,
found 1
6   E306 expected 1 blank line before a nested definition, found 0
7   E402 module level import not at top of file
3800E501 line too long (80 > 79 characters)
278 E502 the backslash is redundant between brackets
87  E701 multiple statements on one line (colon)
74  E703 statement ends with a semicolon
12  E711 comparison to None should be 'if cond is None:'
9   E712 comparison to False should be 'if cond is False:' or 'if not
cond:'
2   E713 test for membership should be 'not in'
2   E741 ambiguous variable name 'l'
1   W292 no newline at end of file
9   W391 blank line at end of file
2   W601 .has_key() is deprecated, use 'in'
19  W602 deprecated form of raising exception

If we take out the well known ones (indent, line width), it does not look
too far fetched to me to change it all to PEP8.

Thoughts?



On Tue, Jan 31, 2017 at 5:59 PM, Michael Brown <mi...@cloudera.com> wrote:

> Thanks. I made some comments on the review, but I see now I should
> probably share my general view here.
>
> My general view is, if we are going to codify our Python style guide,
> I would rather we codify style conventions that are closer to standard
> Python style conventions, rather than codify what is currently done. I
> am willing to keep 2-space indents and 90-char lines, but I don't
> think anything else should be part of the conventions when those
> conventions involves ignoring PEP-008. My instinct tells me the Python
> conventions weren't conventions at all, but came up organically
> without regard to actually reading conventions or using tooling.
> Otherwise, we'd have already had a Python style guide, right?
>
> If the concern is "But there are too many noisy errors if I am editing
> an existing, large file, so we should ignore these anyway", something
> like this is possible:
>
> git diff | flake8 --diff
>
> This will only show PEP-008 problems on changed code, not whole files.
>
>
>
> On Mon, Jan 30, 2017 at 3:20 PM, Lars Volker <l...@cloudera.com> wrote:
> > Cool, thanks Michael for the reply. I added a section on Python to the
> Impala
> > Style Guide
> > <https://cwiki.apache.org/confluence/display/IMPALA/Impala+Style+Guide>.
> > Please feel free to edit it or let me know if I should make changes. I
> will
> > also send out a review to add a .pep8rc file to the repository.
> >
> > On Fri, Jan 27, 2017 at 11:56 PM, Michael Brown <mi...@cloudera.com>
> wrote:
> >
> >> I prefer str.format() over the % operator, because:
> >>
> >> https://docs.python.org/2.7/library/stdtypes.html#str.format
> >>
> >> "This method of string formatting is the new standard in Python 3, and
> >> should

Re: Python string formatting

2017-01-30 Thread Lars Volker
Here's a change with the .pep8rc file I use:
https://gerrit.cloudera.org/#/c/5829/

On Tue, Jan 31, 2017 at 12:20 AM, Lars Volker <l...@cloudera.com> wrote:

> Cool, thanks Michael for the reply. I added a section on Python to the Impala
> Style Guide
> <https://cwiki.apache.org/confluence/display/IMPALA/Impala+Style+Guide>.
> Please feel free to edit it or let me know if I should make changes. I will
> also send out a review to add a .pep8rc file to the repository.
>
> On Fri, Jan 27, 2017 at 11:56 PM, Michael Brown <mi...@cloudera.com>
> wrote:
>
>> I prefer str.format() over the % operator, because:
>>
>> https://docs.python.org/2.7/library/stdtypes.html#str.format
>>
>> "This method of string formatting is the new standard in Python 3, and
>> should be preferred to the % formatting described in String Formatting
>> Operations in new code."
>>
>> Without an Impala Python style guide, I tend to use what I see on
>> docs.python.org, modulo our 2-space indent and 90-char line policy.
>>
>>
>> On Fri, Jan 27, 2017 at 2:44 PM, Lars Volker <l...@cloudera.com> wrote:
>> > Hi All,
>> >
>> > do we have a strong preference for either old style or new style string
>> > formatting in Python?
>> >
>> > "Hello %s!" % ("world") *vs* "Hello {0}!".format("world")
>> >
>> > The Impala Style Guide
>> > <https://cwiki.apache.org/confluence/display/IMPALA/Impala+Style+Guide>
>> doesn't
>> > mention Python at all.
>> >
>> > Thanks, Lars
>>
>
>


Re: Python string formatting

2017-01-30 Thread Lars Volker
Cool, thanks Michael for the reply. I added a section on Python to the Impala
Style Guide
<https://cwiki.apache.org/confluence/display/IMPALA/Impala+Style+Guide>.
Please feel free to edit it or let me know if I should make changes. I will
also send out a review to add a .pep8rc file to the repository.

On Fri, Jan 27, 2017 at 11:56 PM, Michael Brown <mi...@cloudera.com> wrote:

> I prefer str.format() over the % operator, because:
>
> https://docs.python.org/2.7/library/stdtypes.html#str.format
>
> "This method of string formatting is the new standard in Python 3, and
> should be preferred to the % formatting described in String Formatting
> Operations in new code."
>
> Without an Impala Python style guide, I tend to use what I see on
> docs.python.org, modulo our 2-space indent and 90-char line policy.
>
>
> On Fri, Jan 27, 2017 at 2:44 PM, Lars Volker <l...@cloudera.com> wrote:
> > Hi All,
> >
> > do we have a strong preference for either old style or new style string
> > formatting in Python?
> >
> > "Hello %s!" % ("world") *vs* "Hello {0}!".format("world")
> >
> > The Impala Style Guide
> > <https://cwiki.apache.org/confluence/display/IMPALA/Impala+Style+Guide>
> doesn't
> > mention Python at all.
> >
> > Thanks, Lars
>


Re: Do we have a table in 'functional' with negative int/float values?

2017-01-26 Thread Lars Volker
Good to know, thanks for your reply.

On Wed, Jan 25, 2017 at 5:42 AM, Alex Behm <alex.b...@cloudera.com> wrote:

> I don't think we have a test table in 'functional' with negative values.
>
> As far as I can tell, the table 'functional.overflow' is a text table that
> contains values that are larger/smaller than the maximum/minimum type
> declare for that column.
> When overflowing during a text scan, Impala will materialize the max value
> for that column (based on the column type in the schema), whereas Hive will
> return NULL or -Infinity.
> So I'd say the behavior you are seeing is expected.
>
> Alex
>
> On Tue, Jan 24, 2017 at 4:14 PM, Lars Volker <l...@cloudera.com> wrote:
>
> > Hi all,
> >
> > while writing tests for IMPALA-3909 I tried to find a test table in
> > 'functional' that has non-pathologic negative integer and float values
> but
> > couldn't find one. Does anyone know of a table with negative values in
> it?
> >
> > I did find functional.overflow, but Hive returns NULL when querying it.
> Is
> > that expected or a Hive bug?
> >
> > Thanks a lot, Lars
> >
>


Do we have a table in 'functional' with negative int/float values?

2017-01-24 Thread Lars Volker
Hi all,

while writing tests for IMPALA-3909 I tried to find a test table in
'functional' that has non-pathologic negative integer and float values but
couldn't find one. Does anyone know of a table with negative values in it?

I did find functional.overflow, but Hive returns NULL when querying it. Is
that expected or a Hive bug?

Thanks a lot, Lars


Re: Can a committer please start the submit job for this change?

2017-01-12 Thread Lars Volker
Thank you!

On Thu, Jan 12, 2017 at 3:40 PM, Jim Apple <jbap...@cloudera.com> wrote:

> done
>
> On Thu, Jan 12, 2017 at 4:19 AM, Lars Volker <l...@cloudera.com> wrote:
> > https://gerrit.cloudera.org/#/c/5669/
> >
> > Thanks, Lars
>


"target version" and "fix version" fields for 2.8/2.9

2017-01-11 Thread Lars Volker
Hi all,

Do we have an automated way of checking "fix version" fields for
correctness? Out of habit I had put "Impala 2.8.0" there, until I realized
that the 2.8.0-rc had been cut already. I went through my changes, manually
checking whether the fix was included in the rc branch, and set the "fix
version" to "impala 2.9.0" if it wasn't. Do we have tooling to validate and
adjust the "fix version" automatically?

Similarly I noticed that some issues were created with "target version" =
"Impala 2.8.0", but now their fixes did not make it into 2.8. Is there a
policy what to do with those?

Thanks, Lars


Renaming prefix-named tests (IMPALA-4721)

2017-01-04 Thread Lars Volker
Hi all,

Some of our test names are also prefixes of other tests, e.g.
in tests/metadata/test_ddl.py we have *test_create_table* and
*test_create_table_as_select*. Selecting the former with "impala-py.test -k
test_create_table" will also select the latter.

In the past when I ran into these I renamed the prefix-named test, usually
by adding "_test" to make them unique. However this is somewhat unintuitive
and consequently needed explanation during reviews. To improve the
situation, I would like to propose changing all affected tests in a single
commit.

Before doing that, I'd like to ask for feedback. I created IMPALA-4721
 to track this.

Thanks, Lars


Can a committer please carry the +2 on this change?

2016-12-16 Thread Lars Volker
Can a committer please carry the +2 on this change?

https://gerrit.cloudera.org/5051

Patch set 5 had a +2 from Marcel.

Thanks, Lars


Can a committer please carry the +2 on this change?

2016-12-15 Thread Lars Volker
Can a committer please carry the +2 on this change?

https://gerrit.cloudera.org/#/c/5453/

Patch set 2 had a +2 from Dan.

Thanks, Lars


Re: Overriding impala-config.sh values

2016-12-10 Thread Lars Volker
+1

On Thu, Dec 8, 2016 at 1:36 PM, Jim Apple  wrote:

> I like this idea.
>
> On Thu, Dec 8, 2016 at 1:26 PM, Tim Armstrong 
> wrote:
> > Hi All,
> >   I wanted to float an idea to improve the usability of impala-config.sh
> >
> > One problem we've seen a lot is that certain config values, e.g.
> > IMPALA_TOOLCHAIN_BUILD_ID can be overridden by preexisting environment
> > variables. This is useful for testing against alternate components but
> > leads to confusing errors if you re-source a changed impala-config.sh and
> > get a mix of old and new config values. E.g. I've seen multiple people
> run
> > into confusing errors where it looks like files are missing from the
> > toolchain s3 bucket.
> >
> > My idea is that we should remove this overriding mechanism and add an
> > alternative one without the problem based on additional config files.
> > impala-config.sh would determine the default values, which could be
> > overridden by additional config values per-branch or in the local dev
> > environment.
> >
> > My initial idea is to have:
> >
> >   ./bin/impala-config.sh
> >   ./bin/impala-config-branch.sh
> >   ./bin/impala-config-local.sh
> >
> > impala-config-branch.sh would be blank by default and version-controlled,
> > and could be used on release/development branches to override particular
> > config values. This would make it simpler to merge and rebase branches.
> >
> > impala-config-local.sh would be non-existent by default and added to
> > .gitignore. Users can then put whatever values they want for local
> testing.
> >
> > Sourcing impala-config.sh would cause the config to be fully reset,
> > avoiding any staleness issues.
> >
> > What do people think?
> >
> > - Tim
>


[Toolchain-CR] IMPALA-4477: Bump Kudu to latest commit on master (e018a83)

2016-12-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4477: Bump Kudu to latest commit on master (e018a83)
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6fb47f30dc6c6478a125d5d4df5be11b5797e2df
Gerrit-PatchSet: 2
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-HasComments: No


[Toolchain-CR] IMPALA-4477: Bump Kudu to latest commit on master (e018a83)

2016-12-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4477: Bump Kudu to latest commit on master (e018a83)
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5460/1/buildall.sh
File buildall.sh:

Line 240: KUDU_VERSIONS="0.8.0-RC1 0.9.0-RC1 0.10.0-RC1 1.0.0-RC1 f2aeba 
60aa54e a70c905006"
nit: I first thought we might want to standardize the commit hashes on the same 
length, e.g. 7 chars. Then I realized that once a hash makes it in here and is 
referred to in Impala's git, it cannot even be shortened. Maybe add a comment 
that explains this and asks future maintainers to keep these exactly 7chars 
long.


-- 
To view, visit http://gerrit.cloudera.org:8080/5460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6fb47f30dc6c6478a125d5d4df5be11b5797e2df
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-HasComments: Yes


[Toolchain-CR] IMPALA-4636: Add support for SLES12 for Impala/Kudu integration

2016-12-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4636: Add support for SLES12 for Impala/Kudu integration
..


Patch Set 2: Code-Review+2

(1 comment)

I don't know enough about the cmake patch, but since we use it for Impala, too, 
it looks good to me.

http://gerrit.cloudera.org:8080/#/c/5454/2/source/kudu/build.sh
File source/kudu/build.sh:

PS2, Line 18: KUDU_VERSION
nit: This should be $PACKAGE_VERSION now I guess.


-- 
To view, visit http://gerrit.cloudera.org:8080/5454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I87d22028e08b19a1b7f39cb5def115a82560e292
Gerrit-PatchSet: 2
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-HasComments: Yes


[Toolchain-CR] IMPALA-4636: Add support for SLES12 for Impala/Kudu integration

2016-12-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4636: Add support for SLES12 for Impala/Kudu integration
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5454/1/source/kudu/build.sh
File source/kudu/build.sh:

PS1, Line 29: KUDU_VERSION
is this still needed, now that the script references $PACKAGE_VERSION?


-- 
To view, visit http://gerrit.cloudera.org:8080/5454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I87d22028e08b19a1b7f39cb5def115a82560e292
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-HasComments: Yes


Re: Important: new shell & bootstrap required after rebasing

2016-12-09 Thread Lars Volker
If you don't want to mess with your pane layout in tmux you can run

tmux respawn-pane -k

to kill and replace the current pane with an entirely fresh shell.

I bind this to a shortcut in my tmux config like so:

bind K respawn-pane -k

Cheers, Lars

On Wed, Dec 7, 2016 at 2:49 PM, Matthew Jacobs  wrote:

> Sorry, that was a different build I referenced for a newer toolchain.
>
>
> Starting a new shell should work.
>
> Otherwise use:
> export IMPALA_TOOLCHAIN_BUILD_ID=289-f12b0dd2e9
>
> On Wed, Dec 7, 2016 at 2:44 PM, Jim Apple  wrote:
> > I followed these instructions and saw a HTTP 403 error on:
> >
> > https://native-toolchain.s3.amazonaws.com/build/301-
> 05e1fe2b3b/kudu/60aa54e-gcc-4.9.2/kudu-60aa54e-gcc-4.9.2-
> ec2-package-ubuntu-14-04.tar.gz
> >
> > On Tue, Dec 6, 2016 at 9:48 PM, Matthew Jacobs  wrote:
> >> The Kudu version was updated to the latest master. After fetching the
> >> latest Impala commits (including commit 5188f87) and rebasing, you'll
> >> need to:
> >>
> >> 1) start a new shell OR make sure to
> >> export IMPALA_TOOLCHAIN_BUILD_ID=301-05e1fe2b3b
> >>
> >>
> >> 2) Run buildall.sh, that will:
> >>a) download the new Kudu bits in the toolchain by calling
> >>bootstrap_toolchain.py (or you can do so yourself)
> >>b) restart the minicluster with the new kudu
> >>
> >>
> >> We will have to update Kudu a few more times before the release, so
> >> thank you for your patience. Hopefully this makes it as painless as
> >> possible.
>


Re: Impala integration with HDFS erasure coding

2016-11-28 Thread Lars Volker
Hi Andrew,

I'd be interested in the answer to Tim's question, too.

By default, the Impala scheduler prefers local reads over remote reads.
Local reads are scheduled before remote reads, but both take into account
the volume of data that has already been assigned to a backend. Scheduling
does not (yet) take rack locality into account.

Cheers, Lars

On Tue, Nov 22, 2016 at 5:23 AM, Tim Armstrong 
wrote:

> Hi Andrew,
>   I wanted to reply to get the conversation started, although I'm not as
> knowledgeable as others on this topic.
>
> How are the erasure-coded blocks handled by the block locations APIs?
>
> I believe our scheduler just reverts to round-robin if the blocks aren't
> local to a particular daemon (we already do this for S3 and filesystems
> like DSSD and Isilion).
>
> We handle remote reads differently from local reads - we have separate I/O
> queues for each local disk, then a separate remote read queue. It looks
> like we do up to 8 concurrent remote reads by default. It might just work
> out of the box, although I don't know if the current parameters are
> optimal.
>
> On Thu, Nov 17, 2016 at 1:43 PM, Andrew Wang 
> wrote:
>
> > Hi Impala folks,
> >
> > I was wondering if there was any Impala work required to integrate with
> > HDFS erasure coding (planned for release in Hadoop 3, already available
> in
> > alpha form in 3.0.0-alpha1). I know that Impala tries to localize to
> nodes
> > and disks. With EC though, most reads will be remote, so locality isn't
> > important.
> >
> > Is Impala scheduling going to work out-of-the-box?
> >
> > Another idea is to implement a stride-aware data format, which re-enables
> > locality even for striped blocks. It's not clear if this is important
> > though, since EC is meant for cold data that isn't queried often.
> >
> > Thanks,
> > Andrew
> >
>


Re: How to proceed with IMPALA-4086 (Benchmark for SimpleScheduler)

2016-11-16 Thread Lars Volker
Thank you Tim for your response. After talking to Marcel in person I filed
https://issues.cloudera.org/browse/IMPALA-4496 to track this effort
separately.

On Tue, Nov 15, 2016 at 4:14 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> It sounds like this is a) a lot of work to do initially and b) a lot of
> work to maintain as the thrift data structures evolve.
>
> It seems like benchmarking at that granularity might not be worth the
> hassle. It sounds like the lower-level microbenchmark you've added is maybe
> simpler.
>
> It could be very worthwhile to benchmark the combined planning + scheduling
> process, since that would presumably require less plumbing.
>
> On Fri, Nov 11, 2016 at 4:49 AM, Lars Volker <l...@cloudera.com> wrote:
>
> > Hi all,
> >
> > Here is a change <https://gerrit.cloudera.org/4554> that implements a
> > benchmark for SimpleScheduler::ComputeScanRangeAssigment() to address
> > IMPALA-4086 <https://issues.cloudera.org/browse/IMPALA-4086>.
> >
> > I would like to discuss whether it is possible to run the benchmark
> against
> > the Schedule() method instead. This would require changes to the
> scheduler
> > test utility classes in simple-scheduler-test-util.h to create a
> > TQueryExecRequest message suitable for calling Schedule().
> >
> > Currently we compute these fields before calling
> > ComputeScanRangeAssignment(), which are basically what is contained in a
> > single plan node.
> >
> > BackendConfig
> > > vector
> > > vector
> > > TQueryOptions
> >
> >
> > To build a schedule object we need to build a TQueryExecRequest, which
> has
> > 14 fields. The complex ones are:
> >
> > optional Descriptors.TDescriptorTable desc_tbl
> > > optional list fragments
> > > optional list dest_fragment_idx
> > > optional map<Types.TPlanNodeId, list>
> > > per_node_scan_ranges
> > > optional list mt_plan_exec_info
> > > optional Results.TResultSetMetadata result_set_metadata
> > > optional TFinalizeParams finalize_params
> > > required ImpalaInternalService.TQueryCtx query_ctx
> > > optional string query_plan
> > > required list host_list
> > > optional LineageGraph.TLineageGraph lineage_graph
> >
> >
> > Some of these members have other dependencies, for example the fragments
> > have the plan inside, which has all plan nodes:
> >
> > TQueryExecRequest:
> > >  list fragments
> > >   partition.type
> > >   plan.nodes[node_id]
> > >node_id (for dcheck)
> > >node.hdfs_scan_node (can be unset)
> > >   idx (for sorting in query-schedule)
> > >  TQueryCtx query_ctx (only for query options, which we already have)
> >
> >
> > I think it makes sense to benchmark ComputeScanRangeAssignment() in
> > isolation, since its implementation is reasonably complex, i.e. not just
> > linear in the input size. In order to benchmark Schedule(), we should
> first
> > consider writing proper unit tests for the SimpleScheduler and extend the
> > test utility code where necessary to do so.
> >
> > I curious for any feedback. Thanks, Lars
> >
>


Re: compile_commands issues after using distcc?

2016-11-16 Thread Lars Volker
Wild guessing ahead since I'm not using distcc on my German machine, but my
guess would be that you can modify the FlagsForFile() method in your
ycm_extra_conf.py and strip the leading distcc calls before returnig the
flags.

On Wed, Nov 16, 2016 at 6:37 PM, Dimitris Tsirogiannis <
dtsirogian...@cloudera.com> wrote:

> I tried it but couldn't make it to work. By enabling distcc the
> compile_commands.json no longer call clang but distcc instead and
> YouCompleteMe doean't like that. That's as far that I got...
>
> Dimitris
>
> On Wed, Nov 16, 2016 at 9:21 AM, Jim Apple  wrote:
> > bin/run_clang_tidy.sh uses compile_commands.json and it works with
> distcc on.
> >
> > On Wed, Nov 16, 2016 at 9:13 AM, Matthew Jacobs  wrote:
> >> Hi all,
> >>
> >> Do any vim/YouCompleteMe users (or anyone else using the
> >> compile_commands.json) have issues after enabling distcc? I've found
> >> my vim completion (via YouCompleteMe) no longer works.
> >>
> >> I'm curious to know if anyone has any workarounds or suggestions.
> >>
> >> Thanks
>


Re: Importance of breakpad component in Impala

2016-11-11 Thread Lars Volker
This one:
https://groups.google.com/forum/#!topic/google-breakpad-discuss/R9ZQ-l6QOSY

It's on breakpad-discuss, sorry for the confusion.

On Fri, Nov 11, 2016 at 6:08 PM, Jim Apple <jbap...@cloudera.com> wrote:

> This thread?
>
> https://groups.google.com/d/topic/google-breakpad-dev/
> NrEcwEnhDps/discussion
>
> On Fri, Nov 11, 2016 at 8:52 AM, Lars Volker <l...@cloudera.com> wrote:
> > Hi Valencia,
> >
> > I assume your question relates to the thread on breakpad-dev. Currently
> > Impala uses breakpad to handle crashes and write minidump files. If you
> > want to remove/disable minidump support on power, look at
> minidump.{h,cc},
> > init.cc and the respective cmake files.
> >
> > Cheers, Lars
> >
> > On Fri, Nov 11, 2016 at 1:31 PM, Valencia Serrao <vser...@us.ibm.com>
> wrote:
> >
> >>
> >> Hi All,
> >>
> >> It would be great if I could know the significance on breakpad component
> >> (built as part on the native-toolchain) in the working of Impala.
> >>
> >> Regards,
> >> Valencia
> >>
>


Re: Importance of breakpad component in Impala

2016-11-11 Thread Lars Volker
Hi Valencia,

I assume your question relates to the thread on breakpad-dev. Currently
Impala uses breakpad to handle crashes and write minidump files. If you
want to remove/disable minidump support on power, look at minidump.{h,cc},
init.cc and the respective cmake files.

Cheers, Lars

On Fri, Nov 11, 2016 at 1:31 PM, Valencia Serrao  wrote:

>
> Hi All,
>
> It would be great if I could know the significance on breakpad component
> (built as part on the native-toolchain) in the working of Impala.
>
> Regards,
> Valencia
>


How to proceed with IMPALA-4086 (Benchmark for SimpleScheduler)

2016-11-11 Thread Lars Volker
Hi all,

Here is a change  that implements a
benchmark for SimpleScheduler::ComputeScanRangeAssigment() to address
IMPALA-4086 .

I would like to discuss whether it is possible to run the benchmark against
the Schedule() method instead. This would require changes to the scheduler
test utility classes in simple-scheduler-test-util.h to create a
TQueryExecRequest message suitable for calling Schedule().

Currently we compute these fields before calling
ComputeScanRangeAssignment(), which are basically what is contained in a
single plan node.

BackendConfig
> vector
> vector
> TQueryOptions


To build a schedule object we need to build a TQueryExecRequest, which has
14 fields. The complex ones are:

optional Descriptors.TDescriptorTable desc_tbl
> optional list fragments
> optional list dest_fragment_idx
> optional map
> per_node_scan_ranges
> optional list mt_plan_exec_info
> optional Results.TResultSetMetadata result_set_metadata
> optional TFinalizeParams finalize_params
> required ImpalaInternalService.TQueryCtx query_ctx
> optional string query_plan
> required list host_list
> optional LineageGraph.TLineageGraph lineage_graph


Some of these members have other dependencies, for example the fragments
have the plan inside, which has all plan nodes:

TQueryExecRequest:
>  list fragments
>   partition.type
>   plan.nodes[node_id]
>node_id (for dcheck)
>node.hdfs_scan_node (can be unset)
>   idx (for sorting in query-schedule)
>  TQueryCtx query_ctx (only for query options, which we already have)


I think it makes sense to benchmark ComputeScanRangeAssignment() in
isolation, since its implementation is reasonably complex, i.e. not just
linear in the input size. In order to benchmark Schedule(), we should first
consider writing proper unit tests for the SimpleScheduler and extend the
test utility code where necessary to do so.

I curious for any feedback. Thanks, Lars


Re: clang-tidy

2016-11-08 Thread Lars Volker
Thank you Jim for working on this!

On Fri, Nov 4, 2016 at 5:38 PM, Jim Apple  wrote:

> clang-tidy is a nice tool for catching likely bugs and definitely poor
> syntax. You can use it on your latest patch with:
>
> git diff asf-gerrit/master |
> "${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/share/
> clang/clang-tidy-diff.py"
> -clang-tidy-binary
> "${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/bin/clang-tidy" -p 1
>
> You can check the whole repo with:
>
> bin/run_clang_tidy.sh
>
> The latter produces a lot of output and takes 10-15 minutes. Warnings
> are lines that end in ']'.
>
> Time permitting, I will expand the checks we use and try to integrate
> this into tests or code review tools. I added it to the wiki:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536
>


[Impala-CR](cdh5-2.6.0 5.8.0) PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache

2016-11-03 Thread Lars Volker (Code Review)
Lars Volker has abandoned this change.

Change subject: PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>


[Impala-CR](cdh5-2.6.0 5.8.0) PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache

2016-11-03 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache
..


Patch Set 1:

(1 comment)

> This should probably go in ASF's master branch, then get
 > cherry-picked.

Yes. This has been reviewed and merged into master here: 
https://gerrit.cloudera.org/#/c/4828/

I will coordinate the backporting efforts in the Jira.

http://gerrit.cloudera.org:8080/#/c/4645/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

PS1, Line 438: hadoopRzBufferFree(hdfs_file_->file(), cached_buffer_);
 : cached_buffer_ = NULL;
 : // Close file that was opened in Open().
 : io_mgr_->CacheOrCloseFileHandle(file(), hdfs_file_, false);
 : hdfs_file_ = NULL;
 : stringstream ss;
> Can't we just call ScanRange::Close()?
Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/4645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-HasComments: Yes


Re: Bootstrapping a impala dev failed on fresh installed Ubuntu 14.04

2016-11-02 Thread Lars Volker
Yes, this is already committed to the impala-setup repo and I used it
yesterday on a fresh Ubuntu 14.04 machine with success.

Amos, after running impala-setup you will need to re-login to make sure the
changes made to the system limits are effective. You can check them by
running "ulimit -n" in your shell.

On Wed, Nov 2, 2016 at 5:48 AM, Jim Apple  wrote:

> Isn't that already part of the script?
>
> https://github.com/awleblang/impala-setup/commit/
> 56fa829c99e997585eb63fcd49cb65eb8357e679
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.
> git;a=blob;f=bin/bootstrap_development.sh;h=8c4f742ae058f8017858d2a749e882
> 4be58bd410;hb=HEAD#l68
>
> On Tue, Nov 1, 2016 at 9:44 PM, Dimitris Tsirogiannis
>  wrote:
> > Hi Amos,
> >
> > You need to increase your limits (/etc/security/limits.conf) for max
> > number of open files (nofile). Use a pretty big number (e.g. 500K) for
> > both soft and hard.
> >
> > Hope that helps.
> >
> > Dimitris
> >
> > On Tue, Nov 1, 2016 at 8:57 PM, Amos Bird  wrote:
> >>
> >> Hi there,
> >>
> >> After days of efforts to make impala's local tests work on my Centos
> >> machine, I finally gave up and turns to Ubuntu. I followed this simple
> >> guide
> >> https://cwiki.apache.org/confluence/display/IMPALA/
> Bootstrapping+an+Impala+Development+Environment+From+Scratch
> >> on a fresh installed Ubuntu 14.04. Unfortunately there are still errors
> >> in loading data phase. Here is the error log,
> >>
> >> 
> -
> >> Loading Kudu TPCH (logging to 
> >> /home/amos/impala/logs/data_loading/load-kudu-tpch.log)...
> FAILED
> >> 'load-data tpch core kudu/none/none force' failed. Tail of log:
> >> distribute by hash (c_custkey) into 9 buckets stored as kudu
> >>
> >> (load-tpch-core-impala-generated-kudu-none-none.sql):
> >>
> >>
> >> Executing HBase Command: hbase shell load-tpch-core-hbase-
> generated.create
> >> 16/11/02 01:07:58 INFO Configuration.deprecation: hadoop.native.lib is
> deprecated. Instead, use io.native.lib.available
> >> SLF4J: Class path contains multiple SLF4J bindings.
> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
> toolchain/cdh_components/hbase-1.2.0-cdh5.10.0-
> SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
> toolchain/cdh_components/hadoop-2.6.0-cdh5.10.0-
> SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> >> Executing HBase Command: hbase shell post-load-tpch-core-hbase-
> generated.sql
> >> 16/11/02 01:08:03 INFO Configuration.deprecation: hadoop.native.lib is
> deprecated. Instead, use io.native.lib.available
> >> SLF4J: Class path contains multiple SLF4J bindings.
> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
> toolchain/cdh_components/hbase-1.2.0-cdh5.10.0-
> SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
> toolchain/cdh_components/hadoop-2.6.0-cdh5.10.0-
> SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> >> Invalidating Metadata
> >> (load-tpch-core-impala-load-generated-kudu-none-none.sql):
> >> INSERT INTO TABLE tpch_kudu.lineitem SELECT * FROM tpch.lineitem
> >>
> >> Data Loading from Impala failed with error: ImpalaBeeswaxException:
> >>  Query aborted:
> >> Kudu error(s) reported, first error: Timed out: Failed to write batch
> of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 after 329
> attempt(s): Failed to write to server: (no server available): Write(tablet:
> 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, num_attempts: 329)
> passed its deadline: Network error: recv error: Connection reset by peer
> (error 104)
> >>
> >>
> >>
> >> Kudu error(s) reported, first error: Timed out: Failed to write batch
> of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 after 329
> attempt(s): Failed to write to server: (no server available): Write(tablet:
> 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708, num_attempts: 329)
> passed its deadline: Network error: recv error: Connection reset by peer
> (error 104)
> >> Error in Kudu table 'impala::tpch_kudu.lineitem': Timed out: Failed to
> write batch of 2708 ops to tablet 84aa134fb6c24916aa16cf50f48ec557 after
> 329 attempt(s): Failed to write to server: (no server available):
> Write(tablet: 84aa134fb6c24916aa16cf50f48ec557, num_ops: 2708,
> num_attempts: 329) passed its deadline: Network error: 

Re: c++ includes

2016-10-28 Thread Lars Volker
I remember using this tool together with fix_includes and the most annoying
issue was that it would remove high-level includes (e.g.
boost/unordered_map) and replace them with all the headers from inside that
include instead, making it very hard to use.

This would sometimes turn #include  into

...
#include 
#include 
#include 
...



On Thu, Oct 27, 2016 at 10:08 PM, Alex Behm  wrote:

> I recently added a script to run cppclean over the Impala BE:
>
> https://pypi.python.org/pypi/cppclean
>
> You can run it in IMPALA_HOME/bin/cppclean.sh
>
> I don't think cppclean is as sophisticated as include-what-you-see, but it
> might be an easy start.
>
> On Thu, Oct 27, 2016 at 3:59 PM, Jim Apple  wrote:
>
> > I used it before fix_includes was available. It found a lot of issues,
> > but I hated having to fix them. I suspect that it is much, much more
> > usable now.
> >
> > On Thu, Oct 27, 2016 at 3:51 PM, Marcel Kornacker 
> > wrote:
> > > Does anyone have experience with this tool for analyzing C++ includes?
> > > http://include-what-you-use.org/
> > >
> > > I have a feeling that our development productivity is somewhat
> > > impacted by too-generous includes (instead of forward declarations),
> > > and getting some (mechanized) help for cleaning that up would be
> > > useful.
> > >
> > > This would also be a positive newbie contribution.
> >
>


Re: Unable to start Hive with testdata.bin/run-all.sh

2016-10-26 Thread Lars Volker
Could this be the error where your local test minicluster is not
initialized correctly? It will fail to use postgres and erroneously try to
use derby instead.

I recently started running ./bin/create-test-configuration.sh to resolve
issues with the test cluster. I think you can also add -start_minicluster
to buildall.sh.

On Wed, Oct 26, 2016 at 10:36 AM, Jim Apple  wrote:

> I am seeing the error below when trying to get my development
> environment up and running. Has anyone else seen and worked around
> this before?
>
> 2016-10-26 10:27:16,719 ERROR Datastore.Schema
> (Log4JLogger.java:error(125)) - Failed initialising database.
> Unable to open a test connection to the given database. JDBC url =
> jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
> Terminating connection pool (set lazyInit to true if you expect to
> start your database after your app). Original Exception: --
> java.sql.SQLException: Failed to start database 'metastore_db' with
> class loader sun.misc.Launcher$AppClassLoader@3ad6a0e0, see the next
> exception for details.
> at org.apache.derby.impl.jdbc.SQLExceptionFactory.
> getSQLException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.SQLExceptionFactory.
> getSQLException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown
> Source)
> [MANY MORE LINES]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: ERROR XJ040: Failed to start database 'metastore_db' with
> class loader sun.misc.Launcher$AppClassLoader@3ad6a0e0, see the next
> exception for details.
> at org.apache.derby.iapi.error.StandardException.
> newException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.SQLExceptionFactory.
> wrapArgsForTransportAcrossDRDA(Unknown
> Source)
> ... 84 more
> Caused by: ERROR XSDB6: Another instance of Derby may have already
> booted the database /home/jbapple/Impala/metastore_db.
> at org.apache.derby.iapi.error.StandardException.
> newException(Unknown
> Source)
> at org.apache.derby.iapi.error.StandardException.
> newException(Unknown
> Source)
> [MANY MORE LINES]
>


How can I retain a config file in be/ ?

2016-10-25 Thread Lars Volker
My editor (vim) takes a config file in be/ (be/.ycm_extra_conf.py). I
recently pushed a change  to include
it in the .gitignore file, making it survive runs of "git clean -fdx".
However, I now found that we actually also call "git clean -fdX" (uppercase
X), effectively killing everything that is included in .gitignore from
clean.sh.

Is there a way to clean the repository while obeying the rules in
.gitignore? Can we change the -x to a -X? Is there any other workaround to
retain this config file, short of linking it again every time I run
clean.sh (or buildall)?

Thanks for the help, Lars


Need to re-generate functional_kudu tables

2016-10-24 Thread Lars Volker
After this change  I needed to
re-generate the tables in functional_kudu. I did so running this command:

./bin/load-data.py -w functional-planner --table_formats=kudu/none --force

Afterwards I had to recompute stats for two of the tables to get the
PlannerTest to pass:

[localhost:21000] > compute stats functional_kudu.zipcode_incomes;
[localhost:21000] > compute stats functional_kudu.testtbl;

You might need to compute stats for the other tables, too.

Thanks Dimitris for helping me with this. Lars


Cannot push new patch set to Gerrit

2016-10-24 Thread Lars Volker
I tried to push another patch set to Gerrit and got the error below. This
worked about an hour ago and now stopped working. The local parent commit
is the same one as remote, I didn't rebase either. Has anyone seen this
before? I suspect I can rebase but then the review gets much more tedious
to follow.

Thanks, Lars

✔ ~/i2(i2521) ✗$ git push asf HEAD:refs/for/master
To ssh://gerrit.cloudera.org:29418/Impala-ASF
 ! [rejected]HEAD -> refs/for/master (non-fast-forward)
error: failed to push some refs to 'ssh://
l...@gerrit.cloudera.org:29418/Impala-ASF'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
✗ ~/i2(i2521) ✗$


Re: test errors in local env

2016-10-21 Thread Lars Volker
Hi Amos,

apologies for the delay? Is this issue still open? If so, can you try to
run the tests with Java 7?

Thanks, Lars

On Thu, Sep 22, 2016 at 6:38 PM, Amos Bird  wrote:

> Thanks Lars,
>
> My Java version is
>
> $ $JAVA_HOME/bin/java -version
> java version "1.8.0_65"
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
>
> > Hi Amos,
> >
> > I tried to see if IMPALA-3643
> >  is still reproducible
> for
> > me. In the process I also ran into the "NoClassDefFoundError" error you
> saw
> > and found out, that this happens if my local hadoop services
> > (./testdata/bin/run-all.sh) isn't running.
> >
> > Once I have the local hadoop services running I'm able to repro the issue
> > in IMPALA-3643. The only thing I remember doing differently from the
> > recommended ways of dev-machine setup is using Java 8. The Java 8 docs on
> > LinkedHashSet
> > 
> don't
> > read like they'd change the ordering, however it doesn't say that the
> > ordering has to be stable *between different versions of Java*. I will
> try
> > to confirm that this issue does not repro with Java 7 when I have time.
> Can
> > you check the Java version you're running. Mine is this:
> >
> > $ $JAVA_HOME/bin/java -version
> > java version "1.8.0_101"
> > Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
> >
> > Alex, could a difference in the implementation between Java 7 and 8 be a
> > reasonable explanation for this?
> >
> > Cheers, Lars
> >
> > On Thu, Sep 22, 2016 at 12:52 AM, Amos Bird  wrote:
> >
> >>
> >> Thank you for helping me out :D
> >>
> >> > Thanks for keeping this discussion going.
> >> >
> >> > On Sun, Sep 18, 2016 at 8:13 AM, Amos Bird 
> wrote:
> >> >
> >> >>
> >> >> > On Fri, Sep 16, 2016 at 9:06 PM, Amos Bird 
> >> wrote:
> >> >> >
> >> >> >>
> >> >> >> Hi there,
> >> >> >>
> >> >> >> I followed the wiki
> >> >> >> https://cwiki.apache.org/confluence/display/IMPALA/How+
> >> >> >> to+load+and+run+Impala+tests
> >> >> >> carefully but still have some problems in my local env.
> >> >> >>
> >> >> >> 1. I need to manually execute "hdfs dfs -mkdir
> >> >> /test-warehouse/emptytable"
> >> >> >> to get rid of some fe test error.
> >> >> >>
> >> >> >>
> >> >> > Ideally, you should not have to do this. Could you tell me what
> errors
> >> >> you
> >> >> > encountered? Sounds like there may be a test or data loading bug we
> >> >> should
> >> >> > fix.
> >> >>
> >> >> The error is :
> >> >>
> >> >> TestLoadData(com.cloudera.impala.analysis.AnalyzeStmtsTest)  Time
> >> >> elapsed: 0.033 sec  <<< FAILURE!
> >> >> java.lang.AssertionError: got error:
> >> >> INPATH location 'hdfs://localhost:20500/test-warehouse/emptytable'
> does
> >> >> not exist.
> >> >> expected:
> >> >> INPATH location 'hdfs://localhost:20500/test-warehouse/emptytable'
> >> >> contains no visible files.
> >> >>   at org.junit.Assert.fail(Assert.java:88)
> >> >>   at org.junit.Assert.assertTrue(Assert.java:41)
> >> >>   at com.cloudera.impala.common.FrontendTestBase.AnalysisError(
> >> >> FrontendTestBase.java:312)
> >> >>   at com.cloudera.impala.common.FrontendTestBase.AnalysisError(
> >> >> FrontendTestBase.java:292)
> >> >>   at com.cloudera.impala.analysis.AnalyzeStmtsTest.TestLoadData(
> >> >> AnalyzeStmtsTest.java:2860)
> >> >>
> >> >>
> >> > Do you have a table functional.emptytable? If yes, then what location
> is
> >> > reported in "show create table"?
> >> Query: show create table functional.emptytable
> >> +-+
> >> | result  |
> >> +-+
> >> | CREATE EXTERNAL TABLE functional.emptytable (   |
> >> |   field STRING  |
> >> | )   |
> >> | PARTITIONED BY (|
> >> |   f2 INT|
> >> | )   |
> >> | STORED AS TEXTFILE  |
> >> | LOCATION 'hdfs://localhost:20500/test-warehouse/emptytable' |
> >> | TBLPROPERTIES ('transient_lastDdlTime'='1464782625')|
> >> +-+
> >> Fetched 1 row(s) in 5.51s
> >>
> >> > Does the directory exist in HDFS?
> >> No.
> >>
> >> >
> >> > You could try to manually reload the table and see if the directory is
> >> > created:
> >> > bin/load-data.py -f -w functional-query --table_names=emptytable
> >> > --table_formats=text/none
> >> After executing this command the 

Re: Backporting error codes

2016-10-20 Thread Lars Volker
Filed https://issues.cloudera.org/browse/IMPALA-4331 to track this.

On Thu, Oct 20, 2016 at 4:41 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> FWIW there's no specific reason that we don't allow gaps in the error code.
> There was a bug in the generator script that meant it didn't handle them
> properly, so I turned that off at one point, but it may not be a very hard
> fix to support it.
>
> On Thu, Oct 20, 2016 at 4:39 PM, Sailesh Mukil <sail...@cloudera.com>
> wrote:
>
> > Of the given choices, I would choose option 2. Not sure if there's a
> better
> > way. Maybe we could add a "// backported. not used" comment next to each
> > unused error code?
> >
> > On Thu, Oct 20, 2016 at 4:24 PM, Lars Volker <l...@cloudera.com> wrote:
> >
> > > Compilation fails if they are not: Numeric error codes must start from
> 0,
> > > be in order, and not have any gaps: got 94, expected 91
> > >
> > > Since I'm backporting a fix it feels dangerous to remove that
> restriction
> > > in the process.
> > >
> > > On Thu, Oct 20, 2016 at 4:16 PM, Henry Robinson <he...@cloudera.com>
> > > wrote:
> > >
> > > > How about not changing the code? Is there any reason they have to be
> > > > gapless?
> > > >
> > > > On 20 October 2016 at 16:12, Lars Volker <l...@cloudera.com> wrote:
> > > >
> > > > > When backporting a change that introduced a new error code to an
> > older
> > > > > version Impala there seem to be two options to prevent gaps in the
> > > error
> > > > > codes:
> > > > >
> > > > >
> > > > >- Change the error code number during the backport. This will
> > result
> > > > in
> > > > >different error codes between versions
> > > > >- Backport all new error codes that have been introduced prior
> to
> > > that
> > > > >change, so that the error code stays the same.
> > > > >
> > > > > Are there other alternatives? Which way should I go?
> > > > >
> > > > > Thanks, Lars
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Henry Robinson
> > > > Software Engineer
> > > > Cloudera
> > > > 415-994-6679
> > > >
> > >
> >
>


Re: Bulk JIRA email, or how to find good newbie tasks

2016-10-20 Thread Lars Volker
I think that could be expressed by having a target version set (either a
version or product backlog) but no newbie tag. In my understanding triaged
is indicated by the target version.

On Thu, Oct 20, 2016 at 4:28 PM, Jim Apple  wrote:

> So you're suggesting that there be no label for "this is triaged and
> not suitable for newbies"?
>
> On Thu, Oct 20, 2016 at 4:14 PM, Henry Robinson 
> wrote:
> > I would strongly prefer not adding "non-newbie". It seems to have limited
> > use, and is another way to increase the state space of JIRA labels,
> > components, etc, etc.
> >
> > Let's just find some good first-patch candidates and tag them.
> >
> > On 20 October 2016 at 16:03, Jim Apple  wrote:
> >
> >> Since I have heard no objection to these tag names, I'm picking
> >> "newbie" and "non-newbie". I am going to start emailing some PMC
> >> members to ask them to categorize issues.
> >>
> >> On Tue, Oct 18, 2016 at 9:57 AM, Jim Apple 
> wrote:
> >> > How shall we distinguish between the following three classes of
> issues:
> >> >
> >> > 1. Un-triaged ramp-up issues
> >> > 2. ramp-up issues that are not for newbies
> >> > 3. newbie issues
> >> >
> >> > We could add two tags, "newbie" and "non-newbie". We could call the
> >> > second tag something other than "non-newbie", like "second-patch" or
> >> > "sophomore".
> >> >
> >> > Thoughts?
> >> >
> >> > On Mon, Oct 17, 2016 at 10:27 PM, Marcel Kornacker <
> mar...@cloudera.com>
> >> wrote:
> >> >> Please don't add that comment. :)
> >> >>
> >> >> What's currently labelled ramp-up is often not a good newbie task
> (and
> >> >> maybe not even a good ramp-up task). The best way to identify newbie
> >> >> tasks is for a few senior engineers to sift through the ramp-up tasks
> >> >> and pick out maybe a few dozen that truly qualify as newbie tasks.
> >> >>
> >> >> I'm happy to help out with that when I get back.
> >> >>
> >> >> On Mon, Oct 17, 2016 at 6:50 PM, Jim Apple 
> >> wrote:
> >> >>> The Impala JIRA has 129 tasks that have no assignee, are still open,
> >> >>> and are labelled ramp* (i.e. ramp-up, ramp-up-introductory, etc.).
> >> >>>
> >> >>> I'd like to find which of those tasks are good tasks for someone who
> >> >>> is making their first Impala patch. I intend to promote those on one
> >> >>> or more of : the blog, the twitter account, this list, the user
> list,
> >> >>> helpwanted.apache.org, and so on.
> >> >>>
> >> >>> The tasks should be the kind of thing that someone won't need too
> much
> >> >>> hand-holding on, once their have their dev environment up and
> working.
> >> >>>
> >> >>> To do this, I was thinking of adding a comment to all 129 tasks to
> ask
> >> >>> the watchers of each issue if it should be labelled "newbie". This
> >> >>> will send hundreds of emails, which is a bummer, but it seems to me
> >> >>> like the best way to track the discussions and decisions.
> >> >>>
> >> >>> What does everyone think?
> >>
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679
>


Re: Backporting error codes

2016-10-20 Thread Lars Volker
Compilation fails if they are not: Numeric error codes must start from 0,
be in order, and not have any gaps: got 94, expected 91

Since I'm backporting a fix it feels dangerous to remove that restriction
in the process.

On Thu, Oct 20, 2016 at 4:16 PM, Henry Robinson <he...@cloudera.com> wrote:

> How about not changing the code? Is there any reason they have to be
> gapless?
>
> On 20 October 2016 at 16:12, Lars Volker <l...@cloudera.com> wrote:
>
> > When backporting a change that introduced a new error code to an older
> > version Impala there seem to be two options to prevent gaps in the error
> > codes:
> >
> >
> >- Change the error code number during the backport. This will result
> in
> >different error codes between versions
> >- Backport all new error codes that have been introduced prior to that
> >change, so that the error code stays the same.
> >
> > Are there other alternatives? Which way should I go?
> >
> > Thanks, Lars
> >
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>


Backporting error codes

2016-10-20 Thread Lars Volker
When backporting a change that introduced a new error code to an older
version Impala there seem to be two options to prevent gaps in the error
codes:


   - Change the error code number during the backport. This will result in
   different error codes between versions
   - Backport all new error codes that have been introduced prior to that
   change, so that the error code stays the same.

Are there other alternatives? Which way should I go?

Thanks, Lars


[Impala-CR](cdh5-2.6.0 5.8.0) PREVIEW IMPALA-4223: Fix buffer handling in ScannerContext

2016-10-18 Thread Lars Volker (Code Review)
Lars Volker has abandoned this change.

Change subject: PREVIEW IMPALA-4223: Fix buffer handling in ScannerContext
..


Abandoned

This seems to fix a non-existing issue (i.e. my analysis was flawed) :)

-- 
To view, visit http://gerrit.cloudera.org:8080/4610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Iff019e15dc89881310914dbfc6ca255d51431110
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>


Re: Bulk JIRA email, or how to find good newbie tasks

2016-10-17 Thread Lars Volker
Cool, it seems very helpful to me to have such a list of easy tasks.

I think that watchers of these issues could often be interested in a fix,
but might not necessarily know how much effort these issues are. Would it
be an alternative to ask people familiar with various parts of the codebase
for their feedback? You could make a list of Jira-searches (or a dashboard
with a pie chart) for each "component" (be, tests, fe, infra), and send
those around, so people just need to click and can go through the list,
identifying issues they think are newbie-friendly.

We could also tag those with the language involved, so people who want to
work in a particular programming language (Python, Java, C++) can find them
easily.

On Mon, Oct 17, 2016 at 9:50 AM, Jim Apple  wrote:

> The Impala JIRA has 129 tasks that have no assignee, are still open,
> and are labelled ramp* (i.e. ramp-up, ramp-up-introductory, etc.).
>
> I'd like to find which of those tasks are good tasks for someone who
> is making their first Impala patch. I intend to promote those on one
> or more of : the blog, the twitter account, this list, the user list,
> helpwanted.apache.org, and so on.
>
> The tasks should be the kind of thing that someone won't need too much
> hand-holding on, once their have their dev environment up and working.
>
> To do this, I was thinking of adding a comment to all 129 tasks to ask
> the watchers of each issue if it should be labelled "newbie". This
> will send hundreds of emails, which is a bummer, but it seems to me
> like the best way to track the discussions and decisions.
>
> What does everyone think?
>


[Impala-CR](cdh5-2.6.0 5.8.0) PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache

2016-10-14 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache
..


Patch Set 1:

> > This should probably go in ASF's master branch, then get
 > > cherry-picked.
 > 
 > Any thoughts on this, Lars?

Yes, that sounds good to me. I asked everyone on the Jira for help with the 
original issue, as well as with reviewing this change, but haven't received any 
feedback on whether we want to move forward with this fix at least. However, I 
will go ahead, write a proper commit message and re-upload it to master.

-- 
To view, visit http://gerrit.cloudera.org:8080/4645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-HasComments: No


  1   2   3   4   5   >