Re: [VOTE] Graduate to a TLP

2017-10-17 Thread Marcel Kornacker
+1

On Tue, Oct 17, 2017 at 9:30 PM, Jeszy  wrote:
> +1
>
> On 18 October 2017 at 03:40, Tim Armstrong  wrote:
>> +1
>>
>> On 17 Oct. 2017 8:38 pm, "Alexander Behm"  wrote:
>>
>>> +1
>>>
>>> On Tue, Oct 17, 2017 at 8:18 PM, Taras Bobrovytsky 
>>> wrote:
>>>
>>> > +1
>>> >
>>> > On Tue, Oct 17, 2017 at 7:56 PM, Michael Ho  wrote:
>>> >
>>> > > +1
>>> > >
>>> > > On Tue, Oct 17, 2017 at 7:25 PM, Thomas Tauber-Marshall <
>>> > > tmarsh...@cloudera.com> wrote:
>>> > >
>>> > > > +1
>>> > > >
>>> > > > On Tue, Oct 17, 2017 at 9:12 PM Bharath Vissapragada <
>>> > > > bhara...@cloudera.com>
>>> > > > wrote:
>>> > > >
>>> > > > > +1
>>> > > > >
>>> > > > > On Tue, Oct 17, 2017 at 7:10 PM, Mostafa Mokhtar <
>>> > > mmokh...@cloudera.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > +1
>>> > > > > >
>>> > > > > > Thanks
>>> > > > > > Mostafa
>>> > > > > >
>>> > > > > > > On Oct 17, 2017, at 7:09 PM, Brock Noland 
>>> > wrote:
>>> > > > > > >
>>> > > > > > > +1
>>> > > > > > >
>>> > > > > > >> On Tue, Oct 17, 2017 at 9:07 PM, Lars Volker >> >
>>> > > > wrote:
>>> > > > > > >> +1
>>> > > > > > >>
>>> > > > > > >>> On Oct 17, 2017 19:07, "Jim Apple" 
>>> > wrote:
>>> > > > > > >>>
>>> > > > > > >>> Following our discussion
>>> > > > > > >>> https://lists.apache.org/thread.html/
>>> > > > 2f5db4788aff9b0557354b9106c032
>>> > > > > > >>> 8a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E
>>> > > > > > >>> , I propose that we graduate to a TLP. According to
>>> > > > > > >>> https://incubator.apache.org/guides/graduation.html#
>>> > > > > > >>> community_graduation_vote
>>> > > > > > >>> this is not required, and https://impala.apache.org/
>>> > bylaws.html
>>> > > > does
>>> > > > > > not
>>> > > > > > >>> say whose votes are "binding" in a graduation vote, so all
>>> > > > community
>>> > > > > > >>> members are welcome to vote.
>>> > > > > > >>>
>>> > > > > > >>> This will remain open 72 hours. I will be notifying
>>> > > > general@incubator
>>> > > > > > it
>>> > > > > > >>> is
>>> > > > > > >>> occurring.
>>> > > > > > >>>
>>> > > > > > >>> This is my +1.
>>> > > > > > >>>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Thanks,
>>> > > Michael
>>> > >
>>> >
>>>


Re: Time for graduation?

2017-10-15 Thread Marcel Kornacker
I am also in favor of graduation.

I'd be happy to serve as the initial PMC chair.

Marcel

On Sun, Oct 15, 2017 at 11:52 AM, Daniel Hecht  wrote:
> I'm in favor of graduation. We have come a long way and I agree it
> feels like we are functioning like an Apache TLP.
>
> Dan
>
> On Fri, Oct 13, 2017 at 9:49 AM, Tim Armstrong  
> wrote:
>> I'd be very happy if we could graduate too. We still obviously have to
>> continue working on growing the community but we've made a huge amount of
>> progress in setting up the infrastructure and processes to be successful as
>> a top level project.
>>
>> I think having users like Brock on the PMC is great so that we can get
>> input on whether the project is going in the right direction from their
>> point of view.
>>
>> - Tim
>>
>> On Thu, Oct 12, 2017 at 4:46 PM, Brock Noland  wrote:
>>
>>> Hi all,
>>>
>>> I've been thinking about this as well and I feel Impala is ready.
>>>
>>> (more inline)
>>>
>>> On Thu, Oct 12, 2017 at 6:06 PM, Todd Lipcon  wrote:
>>>
>>> > On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple  wrote:
>>> >
>>> > > Also, mentors are traditionally included in a graduating podling's PMC,
>>> > > right?
>>> >
>>> > That's often been done but I don't think there's any hard requirement.
>>> > Perhaps we could ask each mentor whether they would like to continue to
>>> be
>>> > involved?
>>> >
>>>
>>> For my part, I don't feel I contribute much to the PMC, but Impala is a
>>> project I use everyday and thus have a strong interest in the project being
>>> successful. I would not be hurt in the *least* if I was not included on the
>>> PMC. However, I'd be more than happy to serve.
>>>
>>> Cheers,
>>> Brock
>>>


Re: Impala entry on Encyclopedia of Big Data

2017-09-12 Thread Marcel Kornacker
Hi Dimitris, I'd be happy to be part of this, and I'm happy to serve
as the contact person for the editorial board.

On Tue, Sep 12, 2017 at 10:22 AM, Dimitris Tsirogiannis
 wrote:
> I've been contacted by the members of the editorial board of the new
> Encyclopedia of Big Data asking if we can contribute an entry for Impala.
> Here are some details:
>
> Each entry contribution should not be more than 2-3 pages in length
> including figures and references. Please feel free to recruit one or two
> more co-authors, if you choose. Just let us know who will be responsible
> for this entry.
>
> The current timeline for the project is as follows:
>
> - Authors submitting their entries --> 15 November 2017
> - Section Editors release the first batch of accepted and approved entries
> --> 15 December 2017
> - Section Editors release the second batch of revised, accepted and
> approved entries -->  31 January 2018
>
> Let me know if there any volunteers (it doesn't have to be a single person)
> interested in writing the Impala entry and I will bring you in touch with
> the members of the editorial board.
>
> Regards
> Dimitris


Re: Should we change tests so they don't use single letter table names?

2017-05-12 Thread Marcel Kornacker
That sounds like a good idea. If it's a single-test table, it would
probably be prudent to embed the name of the test.

On Fri, May 12, 2017 at 4:38 AM, Lars Volker  wrote:

> Hi All,
>
> I frequently create test tables on my local system with names like "t" or
> "p". A couple of frontend tests use the same names and then fail with
> "Table already exists".
>
> Does anyone else hit this from time to time? Can we change the table names
> in the tests to avoid single letter names? If there are no objections, I'll
> open a JIRA.
>
> Thanks, Lars
>


New committer

2017-04-24 Thread Marcel Kornacker
/The Podling Project Management Committee (PPMC) for Apache Impala (incubating)
has invited Mostafa Mokhtar to become a committer and we are pleased to
announce that he has accepted.

Congratulations and welcome, Mostafa!


Re: impala.apache.org now hosting documentation

2017-04-19 Thread Marcel Kornacker
Great job!

On Wed, Apr 19, 2017 at 10:41 AM, Bharath Vissapragada <
bhara...@cloudera.com> wrote:

> The pdf version looks pretty cool. Thanks for all the work.
>
> On Wed, Apr 19, 2017 at 10:37 AM, Michael Brown 
> wrote:
>
>> With the resolution of IMPALA-3398, I'm pleased to announce that the
>> Apache Impala (incubating) Web site now self-hosts its documentation, here
>> http://impala.apache.org/impala-docs.html . Self-hosting documentation
>> should help grow the Impala community.
>>
>> Many in our community helped with this effort, but I'd like to
>> acknowledge the following members who put in a lot of their time to make
>> this happen:
>>
>> Ambreen Kazi
>> Jim Apple
>> John Russell
>> Laurel Hale
>>
>> Thanks to all!
>>
>
>


Re: Upstreaming ppc64le patches for native-toolchain

2017-04-17 Thread Marcel Kornacker
Agreed.

On Mon, Apr 17, 2017 at 2:54 PM, Tim Armstrong 
wrote:

> Something like that makes sense - I think it should be an experimental or
> unsupported feature until if/when we have a critical mass of committers to
> maintain it.
>
> On Mon, Apr 17, 2017 at 2:46 PM, Jim Apple  wrote:
>
> > That makes sense to me. It would be good for us to provide support
> without
> > completely focusing all development effort on a HW platform with fewer
> > users.
> >
> > If ppc64le support lands in 2.10, and between 2.10 and 2.11 the ppc64le
> > tests start breaking for reasons nobody with HW access can debug, should
> we
> > say in 2.11 release notes that ppc64le is no longer supported?
> >
> > Or perhaps, even in the first release where we think it works, we should
> > spell it out as a platform with only "best-effort" support, that way we
> > don't have to retract support?
> >
> > On Mon, Apr 17, 2017 at 2:41 PM, Marcel Kornacker 
> > wrote:
> >
> > > My main concern is that we don't unduly burden the development process.
> > As
> > > such, it makes a lot of sense to keep the PPC tests out of the regular
> > > tests and have the party that's interested in the PPC tests create the
> > > infrastructure and run those tests.
> > >
> > > On Mon, Apr 17, 2017 at 2:30 PM, Jim Apple 
> wrote:
> > >
> > > > One concern I have is sustainability. If only one Impala contributor
> > can
> > > > work with ppc64le, and that contributor is not as seasoned as some of
> > the
> > > > other committers, what happens if ppc64le breaks and the one person
> > with
> > > VM
> > > > access can't fix it?
> > > >
> > > > Part of my concern is just how flaky the current tests are, too. It
> > takes
> > > > some time to be able to distinguish broken tests that are flaky and
> > > broken
> > > > tests that are the result of a specific commit.
> > > >
> > > > My hope was that with a ppc64le VM (maybe through Qemu?) that runs on
> > > > x86-64 Linux, the other contributors could help fix things that broke
> > on
> > > > that platform.
> > > >
> > > > On Mon, Apr 17, 2017 at 12:51 PM, Marcel Kornacker <
> > mar...@cloudera.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Mon, Apr 17, 2017 at 11:56 AM, Henry Robinson <
> he...@cloudera.com
> > >
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Mon, Apr 17, 2017 at 9:09 AM Tim Armstrong <
> > > tarmstr...@cloudera.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I feel like we shouldn't make PPC part of pre-commit at least
> > > > > initially -
> > > > > > > it's an unreasonable barrier if contributors/committers to
> debug
> > > > issues
> > > > > > on
> > > > > > > a platform they don't have easy access to. Having the testing
> > infra
> > > > is
> > > > > > > still important because we don't want to have code in there
> that
> > > will
> > > > > > > gradually bit-rot without us noticing.
> > > > > > >
> > > > > > > On Mon, Apr 17, 2017 at 8:51 AM, Silvius Rus <
> s...@cloudera.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Would it make sense to _not_ run PPC tests as part of
> > presubmit?
> > > > > > Instead
> > > > > > > > Valencia could set up nightly tests using in-house
> > > infrastructure.
> > > > > And
> > > > > > > > share the test results, e.g., by sending them to a new email
> > list
> > > > > > > > te...@impala.incubator.apache.org (that we'd need to create)
> > so
> > > > > > everyone
> > > > > > > > can see when there are failures or if coverage stops for
> > whatever
> > > > > > reason.
> > > > > > > > GCC has been doing something like this for long,
> > > > > > > > https://gcc.gnu.org/ml/gcc-testresults/2017-04/.
> > > > > > > >
> > > > > > > > On Tue, Apr 11, 2017

Re: Upstreaming ppc64le patches for native-toolchain

2017-04-17 Thread Marcel Kornacker
My main concern is that we don't unduly burden the development process. As
such, it makes a lot of sense to keep the PPC tests out of the regular
tests and have the party that's interested in the PPC tests create the
infrastructure and run those tests.

On Mon, Apr 17, 2017 at 2:30 PM, Jim Apple  wrote:

> One concern I have is sustainability. If only one Impala contributor can
> work with ppc64le, and that contributor is not as seasoned as some of the
> other committers, what happens if ppc64le breaks and the one person with VM
> access can't fix it?
>
> Part of my concern is just how flaky the current tests are, too. It takes
> some time to be able to distinguish broken tests that are flaky and broken
> tests that are the result of a specific commit.
>
> My hope was that with a ppc64le VM (maybe through Qemu?) that runs on
> x86-64 Linux, the other contributors could help fix things that broke on
> that platform.
>
> On Mon, Apr 17, 2017 at 12:51 PM, Marcel Kornacker 
> wrote:
>
> > +1
> >
> > On Mon, Apr 17, 2017 at 11:56 AM, Henry Robinson 
> > wrote:
> >
> > > +1
> > >
> > > On Mon, Apr 17, 2017 at 9:09 AM Tim Armstrong  >
> > > wrote:
> > >
> > > > I feel like we shouldn't make PPC part of pre-commit at least
> > initially -
> > > > it's an unreasonable barrier if contributors/committers to debug
> issues
> > > on
> > > > a platform they don't have easy access to. Having the testing infra
> is
> > > > still important because we don't want to have code in there that will
> > > > gradually bit-rot without us noticing.
> > > >
> > > > On Mon, Apr 17, 2017 at 8:51 AM, Silvius Rus 
> > wrote:
> > > >
> > > > > Would it make sense to _not_ run PPC tests as part of presubmit?
> > > Instead
> > > > > Valencia could set up nightly tests using in-house infrastructure.
> > And
> > > > > share the test results, e.g., by sending them to a new email list
> > > > > te...@impala.incubator.apache.org (that we'd need to create) so
> > > everyone
> > > > > can see when there are failures or if coverage stops for whatever
> > > reason.
> > > > > GCC has been doing something like this for long,
> > > > > https://gcc.gnu.org/ml/gcc-testresults/2017-04/.
> > > > >
> > > > > On Tue, Apr 11, 2017 at 9:44 AM, Jim Apple 
> > > wrote:
> > > > >
> > > > > > >
> > > > > > > Locally, I work on native-toolchain using a VM configured with
> > > > > > > Ubuntu16.04ppc64le, 4GB RAM and 50GB of HDD. If  we provide
> you a
> > > VM
> > > > > with
> > > > > > > this config, will it be sufficient ?
> > > > > > >
> > > > > >
> > > > > > What hypervisor/emulator will it use?
> > > > > >
> > > > > > What are the requirements of the host OS and host hardware?
> > > > > >
> > > > > > Why is the config you have it set to so important that you
> mention
> > it
> > > > in
> > > > > > your email - will the config be locked down into that config or
> can
> > > it
> > > > be
> > > > > > reconfigured later?
> > > > > >
> > > > > > How is the VM controlled from the host OS? Keep in mind that a
> GUI
> > > > cannot
> > > > > > be the only option for automated tests.
> > > > > >
> > > > > > FWIW, Impala's test suite probably cannot fully complete without
> at
> > > > least
> > > > > > 8, and I suspect 16, GB of RAM, and we might need more disk
> space,
> > > too,
> > > > > but
> > > > > > these should be reconfigurable with most hypervisors/emulators.
> > > > > >
> > > > >
> > > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679
> > >
> >
>


Re: Upstreaming ppc64le patches for native-toolchain

2017-04-17 Thread Marcel Kornacker
+1

On Mon, Apr 17, 2017 at 11:56 AM, Henry Robinson  wrote:

> +1
>
> On Mon, Apr 17, 2017 at 9:09 AM Tim Armstrong 
> wrote:
>
> > I feel like we shouldn't make PPC part of pre-commit at least initially -
> > it's an unreasonable barrier if contributors/committers to debug issues
> on
> > a platform they don't have easy access to. Having the testing infra is
> > still important because we don't want to have code in there that will
> > gradually bit-rot without us noticing.
> >
> > On Mon, Apr 17, 2017 at 8:51 AM, Silvius Rus  wrote:
> >
> > > Would it make sense to _not_ run PPC tests as part of presubmit?
> Instead
> > > Valencia could set up nightly tests using in-house infrastructure.  And
> > > share the test results, e.g., by sending them to a new email list
> > > te...@impala.incubator.apache.org (that we'd need to create) so
> everyone
> > > can see when there are failures or if coverage stops for whatever
> reason.
> > > GCC has been doing something like this for long,
> > > https://gcc.gnu.org/ml/gcc-testresults/2017-04/.
> > >
> > > On Tue, Apr 11, 2017 at 9:44 AM, Jim Apple 
> wrote:
> > >
> > > > >
> > > > > Locally, I work on native-toolchain using a VM configured with
> > > > > Ubuntu16.04ppc64le, 4GB RAM and 50GB of HDD. If  we provide you a
> VM
> > > with
> > > > > this config, will it be sufficient ?
> > > > >
> > > >
> > > > What hypervisor/emulator will it use?
> > > >
> > > > What are the requirements of the host OS and host hardware?
> > > >
> > > > Why is the config you have it set to so important that you mention it
> > in
> > > > your email - will the config be locked down into that config or can
> it
> > be
> > > > reconfigured later?
> > > >
> > > > How is the VM controlled from the host OS? Keep in mind that a GUI
> > cannot
> > > > be the only option for automated tests.
> > > >
> > > > FWIW, Impala's test suite probably cannot fully complete without at
> > least
> > > > 8, and I suspect 16, GB of RAM, and we might need more disk space,
> too,
> > > but
> > > > these should be reconfigurable with most hypervisors/emulators.
> > > >
> > >
> >
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>


Re: Scope of abort_on_config_error is too large

2017-04-10 Thread Marcel Kornacker
We want to reduce the number of startup flags.

Does it make more sense simply not to regard starting without a
collocated DN as a config error?

On Mon, Apr 10, 2017 at 12:09 PM, Mostafa Mokhtar  wrote:
> When deploying Impala on hosts without a co-located HDFS Data node Impala
> won't start, unless abort_on_config_error=false is passed as a safety valve.
>
> Concern is that abort_on_config_error checks more than just Short circuit
> reads.
>
> Does it make sense to move Short circuit read check out of
> abort_on_config_error or put it in a separate flag?
>
> fe/src/main/java/org/apache/impala/service/JniFrontend.java
>   /**
>* Returns an error string describing all configuration issues. If no
> config issues are
>* found, returns an empty string.
>*/
>   public String checkConfiguration() {
> StringBuilder output = new StringBuilder();
> output.append(checkLogFilePermission());
> output.append(checkFileSystem(CONF));
> output.append(checkShortCircuitRead(CONF));
> return output.toString();
>   }
>
> be/src/service/impala-server.cc
>   Status status = exec_env_->frontend()->ValidateSettings();
>   if (!status.ok()) {
> LOG(ERROR) << status.GetDetail();
> if (FLAGS_abort_on_config_error) {
>   CLEAN_EXIT_WITH_ERROR(
>   "Aborting Impala Server startup due to improper configuration");
> }
>   }
>
>   status = exec_env->tmp_file_mgr()->Init(exec_env->metrics());
>   if (!status.ok()) {
> LOG(ERROR) << status.GetDetail();
> if (FLAGS_abort_on_config_error) {
>   CLEAN_EXIT_WITH_ERROR("Aborting Impala Server startup due to
> improperly "
>"configured scratch directories.");
> }


Re: Upcoming changes to distcc scripts

2017-04-05 Thread Marcel Kornacker
I'm now getting "failed to distribute, running locally instead" for
every source file. Did anything else change?

On Mon, Apr 3, 2017 at 11:44 AM, Tim Armstrong  wrote:
> I'm about to start a merge for a patch that solves some of the issues with
> distcc - switching between ASAN and non-ASAN builds and with ccache:
> https://gerrit.cloudera.org/#/c/6493/ . Unfortunately it won't be
> completely transparent. When you pull down the patch, you'll need to:
>
>- Open a new terminal or run "unset IMPALA_CXX_COMPILER" in your current
>terminal
>- Source bin/distcc/distcc_env.sh
>- Rebuild with buildall.sh
>
> If you run into any problems, you can try cleaning out your cmake-generated
> files.
>
>- ./bin/clean.sh && ./bin/create-test-configuration.sh
>
>
> Cheers,
> Tim


Re: Min/Max runtime filtering on Impala-Kudu

2017-03-27 Thread Marcel Kornacker
On Mon, Mar 27, 2017 at 11:42 AM, Sailesh Mukil  wrote:
> I will be working on a patch to add min/max filter support in Impala, and
> as a first step, specifically target the KuduScanNode, since the Kudu
> client is already able to accept a Min and a Max that it would internally
> use to filter during its scans. Below is a brief design proposal.
>
> *Goal:*
>
> To leverage runtime min/max filter support in Kudu for the potential speed
> up of queries over Kudu tables. Kudu does this by taking a min and a max
> that Impala will provide and only return values in the range Impala is
> interested in.
>
> *[min <= range we're interested in >= max]*
>
> *Proposal:*
>
>
>- As a first step, plumb the runtime filter code from
> *exec/hdfs-scan-node-base.cc/h
>* to *exec/scan-node.cc/h
>*, so that it can be applied to *KuduScanNode*
>cleanly as well, since *KuduScanNode* and *HdfsScanNodeBase* both
>inherit from *ScanNode.*

Quick comment: please make sure your solution also applies to KuduScanNodeMt.

>- Reuse the *ColumnStats* class (exec/parquet-column-stats.h) or
>implement a lighter weight version of it to process and store the Min and
>the Max on the build side of the join.
>- Once the Min and Max values are added to the existing runtime filter
>structures, as a first step, we will ignore the Min and Max values for
>non-Kudu tables. Using them for non-Kudu tables can come in as a following
>patch(es).
>- Similarly, the bloom filter will be ignored for Kudu tables, and only
>the Min and Max values will be used, since Kudu does not accept bloom
>filters yet. (https://issues.apache.org/jira/browse/IMPALA-3741)
>- Applying the bloom filter on the Impala side of the Kudu scan (i.e. in
>KuduScanNode) is not in the scope of this patch.
>
>
> *Complications:*
>
>- We have to make sure that finding the Min and Max values on the build
>side doesn't regress certain workloads, since the difference between
>generating a bloom filter and generating a Min and a Max, is that a bloom
>filter can be type agnostic (we just take a raw hash over the data) whereas
>a Min and a Max have to be type specific.


Re: Toolchain - versioning dependencies with the same version number

2017-02-28 Thread Marcel Kornacker
Yes, I too am particularly concerned about maintaining the ability to
build offline, and downloading the same things over and over again.

I don't quite understand the case against versioning - if gc'ing
obsolete versions in order to reduce storage space is a concern, then
it's probably fine to a) blow away and re-download everything, or b)
throw away old versions manually, if you happen to be in a situation
where a) isn't possible.

On Tue, Feb 28, 2017 at 12:20 PM, Tim Armstrong  wrote:
> I agree it's not too bad if you have a fat pipe to S3, but it's a pretty
> bad regression in usability to make it the default and particularly provide
> no way to opt out.
>
> The toolchain is almost 1GB though, which is pretty problematic to download
> if a developer is on coffee-shop wifi, cellular wireless, airplane wifi,
> etc. It'd also be pretty easy for a developer working offline to switch
> branches, run buildall.sh, have gcc, etc, automatically deleted and then be
> stuck unable to build anything.
>
>
> On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson  wrote:
>
>> I'd prefer not to do that because it's something of a hack and generates
>> too many artifacts if we make incremental build changes, not to mention the
>> extra complexity required to make such a change because new tarballs might
>> need to be uploaded.
>>
>>
>>
>>
>> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker  wrote:
>>
>> > Can we add another version string component like -1 or -impala1, or add a
>> > dummy patch to the affected packages to allow for new versions with the
>> > same upstream version? I think this is what Linux distributions commonly
>> do
>> > to have several versions of the same upstream version.
>> >
>> > On Feb 27, 2017 21:15, "Henry Robinson"  wrote:
>> >
>> > Yes, it would force re-downloading. At my office, downloading a toolchain
>> > takes a matter of a few seconds, so I'm not sure the cost is that great.
>> > And if it turned out to be problematic, one could always change the
>> > toolchain directory for different branches. Having something locally that
>> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/
>> would
>> > work.
>> >
>> > However I wouldn't want to force behaviour that into the toolchain
>> scripts
>> > because of the need for garbage collection it would raise - it wouldn't
>> be
>> > clear when to delete old toolchains programatically.
>> >
>> > On 27 February 2017 at 20:51, Tim Armstrong 
>> > wrote:
>> >
>> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of
>> the
>> > > entire toolchain every time a developer switches between branches with
>> > > different build IDs?
>> > >
>> > > I know some developers do that frequently, e.g. to try and reproduce
>> bugs
>> > > on older versions or backport patches.
>> > >
>> > > I agree it would be good to fix this, since I've run into this problem
>> > > before, I'm just not quite sure what the best solution is. In the other
>> > > case where I had this issue with LLVM I changed the version number (by
>> > > appending noasserts-) to it, but that's really just a hack.
>> > >
>> > > -Tim
>> > >
>> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson 
>> > > wrote:
>> > >
>> > > > As Matt said, I have a patch that implements build ID-based
>> versioning
>> > at
>> > > > https://gerrit.cloudera.org/#/c/6166/2.
>> > > >
>> > > > Does anyone want to take a look? If we could get this in soon it
>> would
>> > > help
>> > > > smooth over the LZ4 change which is going in shortly.
>> > > >
>> > > > On 27 February 2017 at 14:21, Henry Robinson 
>> > wrote:
>> > > >
>> > > > > I agree that that might be useful, and that it's a separately
>> > > addressable
>> > > > > problem.
>> > > > >
>> > > > > On 27 February 2017 at 14:18, Matthew Jacobs 
>> > wrote:
>> > > > >
>> > > > >> Just catching up to this e-mail, though I had seen your code
>> reviews
>> > > > >> and I think this approach makes sense. An additional concern would
>> > be
>> > > > >> how to identify how a toolchain package was built, and AFAIK this
>> is
>> > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this
>> > > > >> e-mail I was thinking about this problem (which I think we can
>> > address
>> > > > >> separately), and that we might want to write the native-toolchain
>> > git
>> > > > >> hash with every toolchain build so that the exact build scripts
>> are
>> > > > >> associated with those build artifacts. I filed
>> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
>> > > > >> problem.
>> > > > >>
>> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <
>> he...@apache.org>
>> > > > >> wrote:
>> > > > >> > As written, the toolchain can't apparently deal with the
>> > possibility
>> > > > of
>> > > > >> > build flags changing, but a dependency version remaining the
>> same.
>> > > > >> >
>> > > > >> > LZ4 has never (afaict) been built with optimization enabled. I
>> > have
>> > > a
>> > > > >> > commit that enables -O3, bu

Re: status-benchmark.cc compilation time

2017-02-21 Thread Marcel Kornacker
I'm also in favor of not compiling it on the standard commandline.

However, I'm very much against allowing the benchmarks to bitrot. As
was pointed out, those benchmarks can be valuable tools during
development, and keeping them in working order shouldn't really impact
the development process.

In other words, let's compile them as part of gvo.

On Tue, Feb 21, 2017 at 10:50 AM, Alex Behm  wrote:
> +1 for not compiling the benchmarks in -notests
>
> On Mon, Feb 20, 2017 at 7:55 PM, Jim Apple  wrote:
>
>> > On which note, would anyone object if we disabled benchmark compilation
>> by
>> > default when building the BE tests? I mean separating out -notests into
>> > -notests and -build_benchmarks (the latter false by default).
>>
>> I think this is a great idea.
>>
>> > I don't mind if the benchmarks bitrot as a result, because we don't run
>> > them regularly or pay attention to their output except when developing a
>> > feature. Of course, maybe an 'exhaustive' run should build the benchmarks
>> > as well just to keep us honest, but I'd be happy if 95% of Jenkins builds
>> > didn't bother.
>>
>> The pre-merge (aka GVM aka GVO) testing builds
>> http://jenkins.impala.io:8080/job/all-build-options, which builds
>> without the "-notests" flag.
>>


Re: [ANNOUNCE] Apache Impala (incubating) 2.8.0 release

2017-01-23 Thread Marcel Kornacker
Excellent. Jim, thank you for managing this release.

On Sun, Jan 22, 2017 at 11:04 PM, Jim Apple  wrote:
> The Apache Impala (incubating) team is pleased to announce the release
> of Impala 2.8.0.
>
> Impala is a high-performance C++ and Java SQL query engine for data
> stored in Apache
> Hadoop-based clusters.
>
> The release is available at: 
> https://impala.incubator.apache.org/downloads.html
>
> Thanks,
>
> The Apache Impala (incubating) team
>
> =
>
> *Disclaimer*
>
> Apache Impala is an effort undergoing incubation at The Apache
> Software Foundation (ASF),
> sponsored by the name of Apache Incubator PMC. Incubation is required
> of all newly accepted
> projects until a further review indicates that the infrastructure,
> communications, and
> decision making process have stabilized in a manner consistent with
> other successful ASF
> projects. While incubation status is not necessarily a reflection of
> the completeness or
> stability of the code, it does indicate that the project has yet to be
> fully endorsed by
> the ASF.


tagging docs-related patches

2017-01-19 Thread Marcel Kornacker
In order to make the slew of auto-generated mail from gerrit more
consumable, I want to propose tagging docs-related patches with
something like
"IMPALA-: [DOCS] "

I don't think this adds any measurable overhead to the
development/docs writing process, but it'll make it easier to
recognize docs-related changes in a list of email subject lines.

Any objections?


Re: run-all.sh stuck at 'Starting kms'

2017-01-11 Thread Marcel Kornacker
It should be (and ntpd is running), and somehow it did get unstuck.

On Wed, Jan 11, 2017 at 4:08 PM, Henry Robinson  wrote:
> I think that's actually stuck at synchronizing NTP, a prerequisite for
> starting Kudu. Is your system clock in sync?
>
> On 11 January 2017 at 16:06, Marcel Kornacker  wrote:
>
>> Has anyone else seen that? What log files should I be looking at for
>> diagnostics?
>>
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679


run-all.sh stuck at 'Starting kms'

2017-01-11 Thread Marcel Kornacker
Has anyone else seen that? What log files should I be looking at for
diagnostics?


Re: Preferred syntax for warning about ignored Status returns

2017-01-06 Thread Marcel Kornacker
I'd vote for option 2, which is visually less distracting and still
communicates the same thing (in the end, it's a function qualifier).

On Fri, Jan 6, 2017 at 2:59 PM, Tim Armstrong  wrote:
> Hi All,
>   I wanted to poll the Impala community for opinions about style for
> declaring functions where the caller is expected to do something with the
> return value.
>
> Ideally we'd be able to declare Status with an attribute that made this
> take effect globally, but unfortunately that's not available until C++17.
>
> So we need to annotate each Status-returning function. The two alternatives
> we discussed on this CR (https://gerrit.cloudera.org/#/c/4878/) were:
>
> #1 - a special macro wrapping Status
>
> MUST_USE(Status) DoSomethingThatCanFail(int64_t foo, Bar* bar);
>
> Pros:
> * Closely connected to the return type that it affects
> * It's easier to search/replace Status with MUST_USE(Status)
>
> Cons:
> * Could get visually noisy if we use it everywhere
>
> #2 - a macro that gets appended to the declaration:
>
> Status DoSomethingThatCanFail(int64_t foo, Bar* bar) WARN_UNUSED_RESULT;
>
> Pros:
> * Macro is slightly
> * Less visually noisy since it's at the end of the declaration
>
> What do people think?


Re: [DISCUSS] Release 2.8.0 soon?

2017-01-04 Thread Marcel Kornacker
Also +1 for doing a 2.8.0 release.

On Wed, Jan 4, 2017 at 2:29 PM, Henry Robinson  wrote:
> +1 to releasing 2.8.0 - it's been a long time since 2.7.0 and a lot has
> been done.
>
> I'd also like to release 2.7.1 in the near future, and will try to find
> time to be the RM for that.
>
> On 4 January 2017 at 14:15, Jim Apple  wrote:
>
>> Oh, and a specific topic for discussion:
>>
>> Should we include the docs in the source tarball? My feeling is yes,
>> but they are not really cleaned up yet, and so contain a lot of
>> Cloudera-specific info.
>>
>> Pros of including the docs in the tarball: Users get a docs tarball
>> from which they can build the hundreds of pages of usable
>> documentation.
>>
>> Cons: 1. Confused users about who runs the project. (Correct answer:
>> Apache Impala PPMC. Confused answer: Cloudera), 2. Possible failed
>> IPMC release vote that slows down the release.
>>
>
> I'd figure out a way to add a big caveat to the docs. Maybe on the landing
> page? Even better if there's a template we can add a caveat to that appears
> on every page.
>
>
>
>>
>> On Wed, Jan 4, 2017 at 2:09 PM, Jim Apple  wrote:
>> > This is not a [VOTE] thread. Everyone is encourage to participate.
>> >
>> > I am volunteering to be a release manager for 2.8.0. I am provisionally
>> testing
>> >
>> > https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=tree;h=
>> 4fa9270e647b9c097295dcc13d97136cca3139ad;hb=4fa9270e647b9c097295dcc13d9713
>> 6cca3139ad
>> >
>> > to branch from.
>> >
>> > Are there any objections to releasing 2.8.0 soon? Keep in mind this is
>> > not your last chance to speak - there will be at least two votes one
>> > for PPMC releasing and one for IPMC releasing. See
>> >
>> > https://cwiki.apache.org/confluence/display/IMPALA/
>> DRAFT%3A+How+to+Release
>>


hbase in core tests

2016-12-14 Thread Marcel Kornacker
I'm wondering whether we should remove hbase coverage from core tests
and only run against hbase in exhaustive tests.

This would not remove coverage in absolute terms, but detecting a
newly-introduced regression against hbase functionality would take
longer and might require more work to pin down the exact commit.

Does anybody believe this would place an extra burden on the
development process that would outweigh the benefits (faster test data
loads and test execution)?


Re: Jenkins jobs to verify patches

2016-12-13 Thread Marcel Kornacker
Regarding suggestion 1: let's make that 01/09 or later to guarantee a
smoother transition.

On Tue, Dec 13, 2016 at 1:51 PM, Jim Apple  wrote:
> Before Impala joined the ASF, code reviews had to go through a
> "verification" step in which a Jenkins jobs downloaded the patch, ran
> all tests and replied back to Jenkins that everything was OK. That
> Jenkins job ran on Cloudera infrastructure and could not be accessed,
> even in a read-only way, by people outside of Cloudera.
>
> To follow the Apache way, I am laboring to replace that with a Jenkins
> server that can be used by any authorized person and read by any
> person. It is at http://jenkins.impala.io:8080. It is able to verify
> patches just like the old Jenkins machine. A few remaining questions:
>
> 1. What should the prerequisites be to turning off the Cloudera-only
> Jenkins verification path?
>
> 2. Who should be able to run jobs on jenkins.impala.io? Some
> possibilities: Committers only, anyone who asks, PMC members only,
> contributors who ask after submitting 5 patches. Higher bars lead to
> less likelihood of abuse, lower ones to easier contributions from
> newbies.
>
> My proposal is this:
>
> 1. We should turn off the CLoudera-only Jenkins verification path
> January 2. jenkins.impala.io is in pretty good shape, and we can make
> further improvements as needed. For instance, it took me five minutes
> just now to cut down on the not-so-interesting debug output from the
> main verification job,
> http://jenkins.impala.io:8080/job/ubuntu-14.04-from-scratch/
>
> 2. Everyone with five patches can request a login.
>
> I'm not married to these ideas, but I wanted to provide a jumping-off
> point for discussion.


Re: Need a +2 to revert the revert to bump the kudu version

2016-12-13 Thread Marcel Kornacker
Done.

On Tue, Dec 13, 2016 at 6:30 AM, Matthew Jacobs  wrote:
> I submitted a gvo optimistically (passed) but this still needs a +2:
>
> https://gerrit.cloudera.org/#/c/5487/
>
> The fix that was needed for this to pass tests is:
>
> commit 73e41cea196703701d40cc67f919287fb3511b9b
> Author: Matthew Jacobs 
> Date: Mon Dec 12 14:32:07 2016 -0800
>
> IMPALA-4642: Fix TestFragmentLifecycle failures; kudu test must wait


Re: Important: new shell & bootstrap required after rebasing, again

2016-12-11 Thread Marcel Kornacker
Sorry, my mistake.

On Sun, Dec 11, 2016 at 9:07 AM, Matthew Jacobs  wrote:
> Try just setting the toolchain build id manually:
> export IMPALA_TOOLCHAIN_BUILD_ID=308-96a4cc516e
>
> Anyway this is annoying, +1 to Tim's proposal for improved environment 
> configs.
>
> On Sun, Dec 11, 2016 at 9:04 AM, Matthew Jacobs  wrote:
>> The URL is still referencing an old toolchain version 267 (267-98c642ffcb).
>>
>> The new version is 308-96a4cc516e ; starting a new shell and sourcing
>> bin/impala-config.sh shouldn't leave any opportunities for the old
>> version to be in your environment. IMPALA_TOOLCHAIN_BUILD_ID must be
>> getting set still.
>>
>> On Sun, Dec 11, 2016 at 8:58 AM, Marcel Kornacker  
>> wrote:
>>> I'm getting Kudu client download errors fairly reliably now. This is
>>> from a clean shell.
>>>
>>> Downloading kudu-e018a83-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz to
>>> /home/marcel/impala/toolchain
>>> Traceback (most recent call last):
>>>   File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 355, in 
>>> 
>>> bootstrap(toolchain_root, packages)
>>>   File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 117, in 
>>> bootstrap
>>> download_package(toolchain_root, pkg_name, pkg_version, compiler)
>>>   File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 99, in
>>> download_package
>>> wget_and_unpack_package(download_path, file_name, destination, True)
>>>   File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 81, in
>>> wget_and_unpack_package
>>> sh.wget(download_path, directory_prefix=destination,
>>> no_clobber=wget_no_clobber)
>>>   File 
>>> "/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
>>> line 1021, in __call__
>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)
>>>   File 
>>> "/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
>>> line 486, in __init__
>>> self.wait()
>>>   File 
>>> "/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
>>> line 500, in wait
>>> self.handle_command_exit_code(exit_code)
>>>   File 
>>> "/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
>>> line 516, in handle_command_exit_code
>>> raise exc(self.ran, self.process.stdout, self.process.stderr)
>>> sh.ErrorReturnCode_8:
>>>
>>>   RAN: '/usr/bin/wget
>>> https://native-toolchain.s3.amazonaws.com/build/267-98c642ffcb/kudu/e018a83-gcc-4.9.2/kudu-e018a83-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz
>>> --directory-prefix=/home/marcel/impala/toolchain --no-clobber'
>>>
>>>   STDOUT:
>>>
>>>
>>>   STDERR:
>>> --2016-12-11 08:56:17--
>>> https://native-toolchain.s3.amazonaws.com/build/267-98c642ffcb/kudu/e018a83-gcc-4.9.2/kudu-e018a83-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz
>>> Resolving native-toolchain.s3.amazonaws.com
>>> (native-toolchain.s3.amazonaws.com)... 54.231.237.83
>>> Connecting to native-toolchain.s3.amazonaws.com
>>> (native-toolchain.s3.amazonaws.com)|54.231.237.83|:443... connected.
>>> HTTP request sent, awaiting response... 403 Forbidden
>>> 2016-12-11 08:56:17 ERROR 403: Forbidden.
>>>
>>> On Sat, Dec 10, 2016 at 11:11 PM, Matthew Jacobs  wrote:
>>>> Hi all,
>>>>
>>>>
>>>> We had to bump the Kudu version again so unfortunately you'll have to
>>>> refresh your environment again to get the latest Kudu. After fetching
>>>> the latest Impala commits (including commit 39017adf) and rebasing,
>>>> you'll need to:
>>>>
>>>>
>>>> 1) start a new shell OR make sure to
>>>> export IMPALA_TOOLCHAIN_BUILD_ID=308-96a4cc516e
>>>>
>>>>
>>>> 2) Run buildall.sh, that will:
>>>>a) download the new Kudu bits in the toolchain by calling
>>>>bootstrap_toolchain.py (or you can do so yourself)
>>>>b) restart the minicluster with the new kudu
>>>>
>>>>
>>>>
>>>> From Lars:
>>>> If you don't want to mess with your pane layout in tmux you can run
>>>> tmux respawn-pane -k
>>>> to kill and replace the current pane with an entirely fresh shell.
>>>> I bind this to a shortcut in my tmux config like so:
>>>> bind K respawn-pane -k


Re: Important: new shell & bootstrap required after rebasing, again

2016-12-11 Thread Marcel Kornacker
I'm getting Kudu client download errors fairly reliably now. This is
from a clean shell.

Downloading kudu-e018a83-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz to
/home/marcel/impala/toolchain
Traceback (most recent call last):
  File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 355, in 
bootstrap(toolchain_root, packages)
  File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 117, in bootstrap
download_package(toolchain_root, pkg_name, pkg_version, compiler)
  File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 99, in
download_package
wget_and_unpack_package(download_path, file_name, destination, True)
  File "/home/marcel/impala/bin/bootstrap_toolchain.py", line 81, in
wget_and_unpack_package
sh.wget(download_path, directory_prefix=destination,
no_clobber=wget_no_clobber)
  File 
"/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
line 1021, in __call__
return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File 
"/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
line 486, in __init__
self.wait()
  File 
"/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
line 500, in wait
self.handle_command_exit_code(exit_code)
  File 
"/home/marcel/impala/infra/python/env/local/lib/python2.7/site-packages/sh.py",
line 516, in handle_command_exit_code
raise exc(self.ran, self.process.stdout, self.process.stderr)
sh.ErrorReturnCode_8:

  RAN: '/usr/bin/wget
https://native-toolchain.s3.amazonaws.com/build/267-98c642ffcb/kudu/e018a83-gcc-4.9.2/kudu-e018a83-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz
--directory-prefix=/home/marcel/impala/toolchain --no-clobber'

  STDOUT:


  STDERR:
--2016-12-11 08:56:17--
https://native-toolchain.s3.amazonaws.com/build/267-98c642ffcb/kudu/e018a83-gcc-4.9.2/kudu-e018a83-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz
Resolving native-toolchain.s3.amazonaws.com
(native-toolchain.s3.amazonaws.com)... 54.231.237.83
Connecting to native-toolchain.s3.amazonaws.com
(native-toolchain.s3.amazonaws.com)|54.231.237.83|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2016-12-11 08:56:17 ERROR 403: Forbidden.

On Sat, Dec 10, 2016 at 11:11 PM, Matthew Jacobs  wrote:
> Hi all,
>
>
> We had to bump the Kudu version again so unfortunately you'll have to
> refresh your environment again to get the latest Kudu. After fetching
> the latest Impala commits (including commit 39017adf) and rebasing,
> you'll need to:
>
>
> 1) start a new shell OR make sure to
> export IMPALA_TOOLCHAIN_BUILD_ID=308-96a4cc516e
>
>
> 2) Run buildall.sh, that will:
>a) download the new Kudu bits in the toolchain by calling
>bootstrap_toolchain.py (or you can do so yourself)
>b) restart the minicluster with the new kudu
>
>
>
> From Lars:
> If you don't want to mess with your pane layout in tmux you can run
> tmux respawn-pane -k
> to kill and replace the current pane with an entirely fresh shell.
> I bind this to a shortcut in my tmux config like so:
> bind K respawn-pane -k


AtomicInt vs. std::atomic

2016-12-08 Thread Marcel Kornacker
We have an AtomicInt in common/atomic.h that's based on some gutil
libraries. Is there any reason to keep this instead of switching to
std::atomic?


Re: Ideas for organizing 3.0

2016-11-05 Thread Marcel Kornacker
dont-try-this-in-2dot8?

On Fri, Nov 4, 2016 at 3:13 PM, Jim Apple  wrote:
> Love it. What shall we call it?
>
> On Thu, Nov 3, 2016 at 10:22 PM, Henry Robinson  wrote:
>> If possible, can we hide all compatibility-breaking features behind the
>> *same* flag, so as to limit the combinatorial explosion of features turned
>> on and off?
>>
>> On 3 November 2016 at 16:50, Daniel Hecht  wrote:
>>
>>> +1 for using flags to avoid branching too early, and then flipping the
>>> defaults for 3.0 (and eventually removing the old code paths in/after
>>> 3.0).  Of course, not everything can be handled this way, but many can.
>>>
>>> On Wed, Nov 2, 2016 at 5:21 PM, Jim Apple  wrote:
>>>
>>> > Does anyone else have any thoughts, ideas, or concerns?
>>> >
>>> > On Fri, Oct 28, 2016 at 10:04 AM, Jim Apple 
>>> wrote:
>>> > > I like this idea a lot.
>>> > >
>>> > > On Fri, Oct 28, 2016 at 10:01 AM, Tim Armstrong <
>>> tarmstr...@cloudera.com>
>>> > wrote:
>>> > >> I think we should also have a period leading up to the branching where
>>> > we
>>> > >> can add incompatible changes guarded by flags. I think otherwise it
>>> > will be
>>> > >> a headache trying to stage things (realistically some
>>> > >> compatibility-breaking changes will be ready early and we don't want
>>> to
>>> > >> have them sitting off to the side bit-rotting).
>>> > >>
>>> > >> One case is getting Impala to work against the latest
>>> Hadoop/Hive/HBase
>>> > >> APIs - there were incompatible changes but it would be great to have
>>> > master
>>> > >> buildable against both versions.
>>> > >>
>>> > >> On Fri, Oct 28, 2016 at 9:51 AM, Jim Apple 
>>> > wrote:
>>> > >>
>>> > >>> The most recent release was 2.7.0. We have 32 issues that we might
>>> > >>> want to tackle for 3.0:
>>> > >>>
>>> > >>> https://issues.cloudera.org/issues/?filter=11830
>>> > >>>
>>> > >>> Does anyone have any thoughts about how to organize this? For
>>> instance
>>> > >>> we might decide:
>>> > >>>
>>> > >>> 0. Starting immediately, the community is encouraged to submit issues
>>> > >>> that would break compatibility. Detailed designs are also encouraged.
>>> > >>>
>>> > >>> 1. After 2.9.0, commits that break compatibility will be allowed in
>>> > >>> the "master" branch.
>>> > >>>
>>> > >>> 2. After 2.9.0 a call will go out for anyone who wants to get a
>>> > >>> compatibility-breaking patch in that they have 3 months to do so.
>>> > >>>
>>> > >>> 3. After three months, we'll cut a new release candidate and bump all
>>> > >>> JIRA issues that would break compatibility to Target Version: Impala
>>> > >>> 4.0
>>> > >>>
>>> > >>> Thoughts?
>>> > >>>
>>> >
>>>
>>
>>
>>
>> --
>> Henry Robinson
>> Software Engineer
>> Cloudera
>> 415-994-6679


c++ includes

2016-10-27 Thread Marcel Kornacker
Does anyone have experience with this tool for analyzing C++ includes?
http://include-what-you-use.org/

I have a feeling that our development productivity is somewhat
impacted by too-generous includes (instead of forward declarations),
and getting some (mechanized) help for cleaning that up would be
useful.

This would also be a positive newbie contribution.


Re: Podling Report Reminder - November 2016

2016-10-27 Thread Marcel Kornacker
Looks good to me as well. Jim, thanks for putting that together.

On Thu, Oct 27, 2016 at 11:27 AM, Todd Lipcon  wrote:
> +1, report looks good to me.
>
> -Todd
>
> On Wed, Oct 26, 2016 at 2:59 PM, Jim Apple  wrote:
>
>> My feeling is no, just because it isn't directly asked about. However,
>> my feeling is not very strong.
>>
>> Here are some of our plans, for the record:
>>
>> For community growth, various PMC members are busily tagging issues as
>> "newbie" right now. Once that's done, I will post about that to dev@
>> and user@. I'm also going to propose we blog and tweet about "newbie"
>> issues.
>>
>> For documentation, John Russell, one of our PMC members, and Laurel
>> Hale, another community member who is going to assist, have already
>> started working on this. The mechanics aren't all worked out, but the
>> documentation is buildable. Questions remain about where in git the
>> docs source will go, how it will be rendered, and so on.
>>
>> For migration of pre-commit CI, Taras Bobrovytsky, one of our PMC
>> members, is working on duplicating Apache Kudu's Jenkins setup.
>>
>> On Wed, Oct 26, 2016 at 2:39 PM, Tim Armstrong 
>> wrote:
>> > Looks good, thanks for writing this up (and all the work that went into
>> > those accomplishments).
>> >
>> > Is it worth mentioning what we plan to do about the top three issues?
>> >
>> > On Wed, Oct 26, 2016 at 2:34 PM, Jim Apple  wrote:
>> >
>> >> Dear dev@ and mentors:
>> >>
>> >> Here is my draft of our next status report. Your feedback is welcomed.
>> >> I plan to post this in about 48 hours.
>> >>
>> >> Impala is a high-performance C++ and Java SQL query engine for data
>> >> stored in Apache Hadoop-based clusters.
>> >>
>> >> > *   A list of the three most important issues to address in the move
>> >> > towards graduation.
>> >>
>> >> The three most important issues are:
>> >>
>> >> * Community growth
>> >>
>> >> * Transition of user documentation to Apache hosting
>> >>
>> >> * Migration of pre-commit continuous integration testing to
>> >> publicly-available infrastructure
>> >>
>> >> > *   Any issues that the Incubator PMC or ASF Board might wish/need to
>> be
>> >> > aware of
>> >>
>> >> No
>> >>
>> >> > *   How has the community developed since the last report
>> >>
>> >> Our last report was in August. Since then, we have five new
>> >> contributors who have authored patches, while two relatively recent
>> >> contributors who were active before August have continued their
>> >> involvement by authoring new patches. Traffic to our developer mailing
>> >> list has grown by about 60%.
>> >>
>> >> > *   How has the project developed since the last report.
>> >>
>> >> There have been 241 commits since the last report.
>> >>
>> >> Our status website now has 16 of the 17 listed work items complete. We
>> >> had our first Apache release and have a wiki page describing how to
>> >> perform the release in detail. We scrubbed our code using the RAT tool
>> >> for copyright notices not compliant with the ASF rules. We wrote
>> >> guidelines for contributors on how to become a committer and added a
>> >> new committer. All developer documentation has now moved to the
>> >> Apache-hosted wiki.
>> >>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera


Re: Bulk JIRA email, or how to find good newbie tasks

2016-10-24 Thread Marcel Kornacker
I guess those can happen concurrently (ie, we need to review everything
labelled 'ramp-up').

On Mon, Oct 24, 2016 at 9:39 AM, Jim Apple  wrote:

> Why do we need to fix the "ramp*" bugs BEFORE we label "newbie" bugs?
>
> On Sat, Oct 22, 2016 at 1:12 PM, Marcel Kornacker 
> wrote:
> > Before you do that we need to sit down and reclassify what's currently
> > labeled 'ramp-up', some of those tasks aren't really suitable as that
> > (because they don't have an educational component).
> >
> > On Fri, Oct 21, 2016 at 10:53 AM, Jim Apple 
> wrote:
> >> SGTM. I will pick some of the PMC members and ask them to individually
> >> classify all of the ramp* bugs in a single component by marking the
> >> ones good for newbies as "newbie" and removing the "ramp*" tags from
> >> "newbie" issues.
> >>
> >> I hope this will produce a nice set of issues for newbies to choose
> >> from to help us entice new developers, testers, doc writers, and so
> >> on.
> >>
> >> On Fri, Oct 21, 2016 at 9:32 AM, Marcel Kornacker 
> wrote:
> >>> I agree, we have enough rules regarding jira labeling already.
> >>>
> >>> A "newbie" label to indicate tasks that someone without any/much
> >>> exposure to the codebase should be able to handle seems like a good
> >>> idea.
> >>>
> >>> "Ramp-up" really means something else (good learning experience, but
> >>> could be moderately complex and require some understanding of the
> >>> codebase).
> >>>
> >>> On Thu, Oct 20, 2016 at 10:32 PM, Henry Robinson 
> wrote:
> >>>> I still think that adding more complexity to the JIRA state space is
> only
> >>>> going to increase the cognitive burden of managing JIRAs, something I
> spend
> >>>> quite a lot of my life doing already. It would need to be a pretty
> big win
> >>>> for it to be worth it to me.
> >>>>
> >>>> Happy to go with the consensus, but would prefer to avoid more labels
> where
> >>>> possible.
> >>>>
> >>>> On 20 October 2016 at 16:44, Jim Apple  wrote:
> >>>>
> >>>>> Ah, I was only suggesting this "non-newbie" as a tool for committers
> >>>>> to use to help us find good newbie bugs, not as a tool for newbies.
> >>>>>
> >>>>> It will probably take a while to triage everything, and some people
> >>>>> will not get their triaging work done in a timely fashion, and new
> >>>>> bugs will show up and need triaging, and old bugs will (hopefully) be
> >>>>> fixed by newbies, requiring us to go and find new bugs to label
> >>>>> "newbie"
> >>>>>
> >>>>> On Thu, Oct 20, 2016 at 4:40 PM, Henry Robinson 
> >>>>> wrote:
> >>>>> > Right. Which 'newbie' is going to have that degree of JIRA nuance?
> >>>>> >
> >>>>> > I also feel it's better to show new contributors what they *could*
> do,
> >>>>> than
> >>>>> > to carefully mark things that they *should not* do.
> >>>>> >
> >>>>> > On 20 October 2016 at 16:28, Jim Apple 
> wrote:
> >>>>> >
> >>>>> >> So you're suggesting that there be no label for "this is triaged
> and
> >>>>> >> not suitable for newbies"?
> >>>>> >>
> >>>>> >> On Thu, Oct 20, 2016 at 4:14 PM, Henry Robinson <
> he...@cloudera.com>
> >>>>> >> wrote:
> >>>>> >> > I would strongly prefer not adding "non-newbie". It seems to
> have
> >>>>> limited
> >>>>> >> > use, and is another way to increase the state space of JIRA
> labels,
> >>>>> >> > components, etc, etc.
> >>>>> >> >
> >>>>> >> > Let's just find some good first-patch candidates and tag them.
> >>>>> >> >
> >>>>> >> > On 20 October 2016 at 16:03, Jim Apple 
> wrote:
> >>>>> >> >
> >>>>> >> >> Since I have heard no objection to these tag names, I'm picking
> >>>>> >> >> "newbie&

Re: crash in PHJ on master

2016-10-24 Thread Marcel Kornacker
Apparently that's the cause. I'll file a jira.

On Mon, Oct 24, 2016 at 8:48 AM, Alex Behm  wrote:

> Maybe you are using an "unusual" log level?
>
> On Mon, Oct 24, 2016 at 8:35 AM, Marcel Kornacker 
> wrote:
>
> > I'm seeing a crash in PartitionedHashJoinNode, is anyone experiencing the
> > same or know what's going on?
> >
> > Here is the stack:
> > C  [libExec.so+0x53706c]  std::unique_ptr > std::default_delete >::get() const+0x18
> > C  [libExec.so+0x54d6bc]
> >  impala::PartitionedHashJoinNode::NodeDebugString() const+0x296
> > C  [libExec.so+0x54d0bd]
> >  impala::PartitionedHashJoinNode::UpdateState(impala::
> > PartitionedHashJoinNode::HashJoinState)+0x37f
> > C  [libExec.so+0x53eeb5]
> >  impala::PartitionedHashJoinNode::Open(impala::RuntimeState*)+0x793
> > C  [libExec.so+0x512fde]
> >  impala::PartitionedAggregationNode::Open(impala::RuntimeState*)+0x4b4
> > C  [libRuntime.so+0x53ff0e]
> >  impala::PlanFragmentExecutor::OpenInternal()+0x3ae
> > C  [libRuntime.so+0x53fa33]  impala::PlanFragmentExecutor::Open()+0x229
> > C  [libService.so+0x30e7a8]
> >  impala::FragmentMgr::FragmentExecState::Exec()+0x66
> > C  [libService.so+0x31c0c8]
> >  impala::FragmentMgr::FragmentThread(impala::TUniqueId)+0x78
> > C  [libService.so+0x33863e]  boost::_mfi::mf1 > impala::TUniqueId>::operator()(impala::FragmentMgr*, impala::TUniqueId)
> > const+0x84
> > C  [libService.so+0x336c6b]  void
> > boost::_bi::list2,
> > boost::_bi::value >::operator() mf1 > impala::FragmentMgr, impala::TUniqueId>,
> > boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 > impala::FragmentMgr, impala::TUniqueId>&, boost::_bi::list0&, int)+0x7d
> > C  [libService.so+0x335557]  boost::_bi::bind_t > boost::_mfi::mf1,
> > boost::_bi::list2,
> > boost::_bi::value > >::operator()()+0x3b
> > C  [libService.so+0x33288c]
> >  boost::detail::function::void_function_obj_invoker0 > bi::bind_t > boost::_mfi::mf1,
> > boost::_bi::list2,
> > boost::_bi::value > >,
> > void>::invoke(boost::detail::function::function_buffer&)+0x23
> > C  [libUtil.so+0x2e75f0]  boost::function0::operator()()
> const+0x52
> > C  [libUtil.so+0x2e6b97]  impala::Thread::SuperviseThread(std::string
> > const&, std::string const&, boost::function,
> > impala::Promise*)+0x2c5
> > C  [libUtil.so+0x2ee48e]  void
> > boost::_bi::list4,
> > boost::_bi::value, boost::_bi::value > ()()> >, boost::_bi::value*> >::operator() > (*)(std::string const&, std::string const&, boost::function,
> > impala::Promise*), boost::_bi::list0>(boost::_bi::type, void
> > (*&)(std::string const&, std::string const&, boost::function,
> > impala::Promise*), boost::_bi::list0&, int)+0xb2
> > C  [libUtil.so+0x2ee3d1]  boost::_bi::bind_t > const&, std::string const&, boost::function,
> > impala::Promise*), boost::_bi::list4 value,
> > boost::_bi::value, boost::_bi::value > ()()> >, boost::_bi::value*> >
> >::operator()()+0x3b
> > C  [libUtil.so+0x2ee32c]
> >  boost::detail::thread_data (*)(std::string
> > const&, std::string const&, boost::function,
> > impala::Promise*), boost::_bi::list4 value,
> > boost::_bi::value, boost::_bi::value > ()()> >, boost::_bi::value*> > > >::run()+0x1e
> >
> > I'm at git hash ff6b450  (IMPALA-4285/IMPALA-4286: Fixes for Parquet
> > scanner with MT_DOP > 0), ie, the latest on master.
> >
>


crash in PHJ on master

2016-10-24 Thread Marcel Kornacker
I'm seeing a crash in PartitionedHashJoinNode, is anyone experiencing the
same or know what's going on?

Here is the stack:
C  [libExec.so+0x53706c]  std::unique_ptr >::get() const+0x18
C  [libExec.so+0x54d6bc]
 impala::PartitionedHashJoinNode::NodeDebugString() const+0x296
C  [libExec.so+0x54d0bd]
 
impala::PartitionedHashJoinNode::UpdateState(impala::PartitionedHashJoinNode::HashJoinState)+0x37f
C  [libExec.so+0x53eeb5]
 impala::PartitionedHashJoinNode::Open(impala::RuntimeState*)+0x793
C  [libExec.so+0x512fde]
 impala::PartitionedAggregationNode::Open(impala::RuntimeState*)+0x4b4
C  [libRuntime.so+0x53ff0e]
 impala::PlanFragmentExecutor::OpenInternal()+0x3ae
C  [libRuntime.so+0x53fa33]  impala::PlanFragmentExecutor::Open()+0x229
C  [libService.so+0x30e7a8]
 impala::FragmentMgr::FragmentExecState::Exec()+0x66
C  [libService.so+0x31c0c8]
 impala::FragmentMgr::FragmentThread(impala::TUniqueId)+0x78
C  [libService.so+0x33863e]  boost::_mfi::mf1::operator()(impala::FragmentMgr*, impala::TUniqueId)
const+0x84
C  [libService.so+0x336c6b]  void
boost::_bi::list2,
boost::_bi::value >::operator(),
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1&, boost::_bi::list0&, int)+0x7d
C  [libService.so+0x335557]  boost::_bi::bind_t,
boost::_bi::list2,
boost::_bi::value > >::operator()()+0x3b
C  [libService.so+0x33288c]
 boost::detail::function::void_function_obj_invoker0,
boost::_bi::list2,
boost::_bi::value > >,
void>::invoke(boost::detail::function::function_buffer&)+0x23
C  [libUtil.so+0x2e75f0]  boost::function0::operator()() const+0x52
C  [libUtil.so+0x2e6b97]  impala::Thread::SuperviseThread(std::string
const&, std::string const&, boost::function,
impala::Promise*)+0x2c5
C  [libUtil.so+0x2ee48e]  void
boost::_bi::list4,
boost::_bi::value, boost::_bi::value >, boost::_bi::value*> >::operator(),
impala::Promise*), boost::_bi::list0>(boost::_bi::type, void
(*&)(std::string const&, std::string const&, boost::function,
impala::Promise*), boost::_bi::list0&, int)+0xb2
C  [libUtil.so+0x2ee3d1]  boost::_bi::bind_t,
impala::Promise*), boost::_bi::list4,
boost::_bi::value, boost::_bi::value >, boost::_bi::value*> > >::operator()()+0x3b
C  [libUtil.so+0x2ee32c]
 boost::detail::thread_data,
impala::Promise*), boost::_bi::list4,
boost::_bi::value, boost::_bi::value >, boost::_bi::value*> > > >::run()+0x1e

I'm at git hash ff6b450  (IMPALA-4285/IMPALA-4286: Fixes for Parquet
scanner with MT_DOP > 0), ie, the latest on master.


Re: Bulk JIRA email, or how to find good newbie tasks

2016-10-22 Thread Marcel Kornacker
Before you do that we need to sit down and reclassify what's currently
labeled 'ramp-up', some of those tasks aren't really suitable as that
(because they don't have an educational component).

On Fri, Oct 21, 2016 at 10:53 AM, Jim Apple  wrote:
> SGTM. I will pick some of the PMC members and ask them to individually
> classify all of the ramp* bugs in a single component by marking the
> ones good for newbies as "newbie" and removing the "ramp*" tags from
> "newbie" issues.
>
> I hope this will produce a nice set of issues for newbies to choose
> from to help us entice new developers, testers, doc writers, and so
> on.
>
> On Fri, Oct 21, 2016 at 9:32 AM, Marcel Kornacker  wrote:
>> I agree, we have enough rules regarding jira labeling already.
>>
>> A "newbie" label to indicate tasks that someone without any/much
>> exposure to the codebase should be able to handle seems like a good
>> idea.
>>
>> "Ramp-up" really means something else (good learning experience, but
>> could be moderately complex and require some understanding of the
>> codebase).
>>
>> On Thu, Oct 20, 2016 at 10:32 PM, Henry Robinson  wrote:
>>> I still think that adding more complexity to the JIRA state space is only
>>> going to increase the cognitive burden of managing JIRAs, something I spend
>>> quite a lot of my life doing already. It would need to be a pretty big win
>>> for it to be worth it to me.
>>>
>>> Happy to go with the consensus, but would prefer to avoid more labels where
>>> possible.
>>>
>>> On 20 October 2016 at 16:44, Jim Apple  wrote:
>>>
>>>> Ah, I was only suggesting this "non-newbie" as a tool for committers
>>>> to use to help us find good newbie bugs, not as a tool for newbies.
>>>>
>>>> It will probably take a while to triage everything, and some people
>>>> will not get their triaging work done in a timely fashion, and new
>>>> bugs will show up and need triaging, and old bugs will (hopefully) be
>>>> fixed by newbies, requiring us to go and find new bugs to label
>>>> "newbie"
>>>>
>>>> On Thu, Oct 20, 2016 at 4:40 PM, Henry Robinson 
>>>> wrote:
>>>> > Right. Which 'newbie' is going to have that degree of JIRA nuance?
>>>> >
>>>> > I also feel it's better to show new contributors what they *could* do,
>>>> than
>>>> > to carefully mark things that they *should not* do.
>>>> >
>>>> > On 20 October 2016 at 16:28, Jim Apple  wrote:
>>>> >
>>>> >> So you're suggesting that there be no label for "this is triaged and
>>>> >> not suitable for newbies"?
>>>> >>
>>>> >> On Thu, Oct 20, 2016 at 4:14 PM, Henry Robinson 
>>>> >> wrote:
>>>> >> > I would strongly prefer not adding "non-newbie". It seems to have
>>>> limited
>>>> >> > use, and is another way to increase the state space of JIRA labels,
>>>> >> > components, etc, etc.
>>>> >> >
>>>> >> > Let's just find some good first-patch candidates and tag them.
>>>> >> >
>>>> >> > On 20 October 2016 at 16:03, Jim Apple  wrote:
>>>> >> >
>>>> >> >> Since I have heard no objection to these tag names, I'm picking
>>>> >> >> "newbie" and "non-newbie". I am going to start emailing some PMC
>>>> >> >> members to ask them to categorize issues.
>>>> >> >>
>>>> >> >> On Tue, Oct 18, 2016 at 9:57 AM, Jim Apple 
>>>> >> wrote:
>>>> >> >> > How shall we distinguish between the following three classes of
>>>> >> issues:
>>>> >> >> >
>>>> >> >> > 1. Un-triaged ramp-up issues
>>>> >> >> > 2. ramp-up issues that are not for newbies
>>>> >> >> > 3. newbie issues
>>>> >> >> >
>>>> >> >> > We could add two tags, "newbie" and "non-newbie". We could call the
>>>> >> >> > second tag something other than "non-newbie", like "second-patch"
>>>> or
>>>> >> >> > "sophomore".
>>>&

Re: Bulk JIRA email, or how to find good newbie tasks

2016-10-21 Thread Marcel Kornacker
I agree, we have enough rules regarding jira labeling already.

A "newbie" label to indicate tasks that someone without any/much
exposure to the codebase should be able to handle seems like a good
idea.

"Ramp-up" really means something else (good learning experience, but
could be moderately complex and require some understanding of the
codebase).

On Thu, Oct 20, 2016 at 10:32 PM, Henry Robinson  wrote:
> I still think that adding more complexity to the JIRA state space is only
> going to increase the cognitive burden of managing JIRAs, something I spend
> quite a lot of my life doing already. It would need to be a pretty big win
> for it to be worth it to me.
>
> Happy to go with the consensus, but would prefer to avoid more labels where
> possible.
>
> On 20 October 2016 at 16:44, Jim Apple  wrote:
>
>> Ah, I was only suggesting this "non-newbie" as a tool for committers
>> to use to help us find good newbie bugs, not as a tool for newbies.
>>
>> It will probably take a while to triage everything, and some people
>> will not get their triaging work done in a timely fashion, and new
>> bugs will show up and need triaging, and old bugs will (hopefully) be
>> fixed by newbies, requiring us to go and find new bugs to label
>> "newbie"
>>
>> On Thu, Oct 20, 2016 at 4:40 PM, Henry Robinson 
>> wrote:
>> > Right. Which 'newbie' is going to have that degree of JIRA nuance?
>> >
>> > I also feel it's better to show new contributors what they *could* do,
>> than
>> > to carefully mark things that they *should not* do.
>> >
>> > On 20 October 2016 at 16:28, Jim Apple  wrote:
>> >
>> >> So you're suggesting that there be no label for "this is triaged and
>> >> not suitable for newbies"?
>> >>
>> >> On Thu, Oct 20, 2016 at 4:14 PM, Henry Robinson 
>> >> wrote:
>> >> > I would strongly prefer not adding "non-newbie". It seems to have
>> limited
>> >> > use, and is another way to increase the state space of JIRA labels,
>> >> > components, etc, etc.
>> >> >
>> >> > Let's just find some good first-patch candidates and tag them.
>> >> >
>> >> > On 20 October 2016 at 16:03, Jim Apple  wrote:
>> >> >
>> >> >> Since I have heard no objection to these tag names, I'm picking
>> >> >> "newbie" and "non-newbie". I am going to start emailing some PMC
>> >> >> members to ask them to categorize issues.
>> >> >>
>> >> >> On Tue, Oct 18, 2016 at 9:57 AM, Jim Apple 
>> >> wrote:
>> >> >> > How shall we distinguish between the following three classes of
>> >> issues:
>> >> >> >
>> >> >> > 1. Un-triaged ramp-up issues
>> >> >> > 2. ramp-up issues that are not for newbies
>> >> >> > 3. newbie issues
>> >> >> >
>> >> >> > We could add two tags, "newbie" and "non-newbie". We could call the
>> >> >> > second tag something other than "non-newbie", like "second-patch"
>> or
>> >> >> > "sophomore".
>> >> >> >
>> >> >> > Thoughts?
>> >> >> >
>> >> >> > On Mon, Oct 17, 2016 at 10:27 PM, Marcel Kornacker <
>> >> mar...@cloudera.com>
>> >> >> wrote:
>> >> >> >> Please don't add that comment. :)
>> >> >> >>
>> >> >> >> What's currently labelled ramp-up is often not a good newbie task
>> >> (and
>> >> >> >> maybe not even a good ramp-up task). The best way to identify
>> newbie
>> >> >> >> tasks is for a few senior engineers to sift through the ramp-up
>> tasks
>> >> >> >> and pick out maybe a few dozen that truly qualify as newbie tasks.
>> >> >> >>
>> >> >> >> I'm happy to help out with that when I get back.
>> >> >> >>
>> >> >> >> On Mon, Oct 17, 2016 at 6:50 PM, Jim Apple 
>> >> >> wrote:
>> >> >> >>> The Impala JIRA has 129 tasks that have no assignee, are still
>> open,
>> >> >> >>> and are labelled ramp* (i.e. ramp-up, ramp-up-introductory,
>> etc.).
>> >> >> >>>
>> >> >> >>> I'd like to find which of those tasks are good tasks for someone
>> who
>> >> >> >>> is making their first Impala patch. I intend to promote those on
>> one
>> >> >> >>> or more of : the blog, the twitter account, this list, the user
>> >> list,
>> >> >> >>> helpwanted.apache.org, and so on.
>> >> >> >>>
>> >> >> >>> The tasks should be the kind of thing that someone won't need too
>> >> much
>> >> >> >>> hand-holding on, once their have their dev environment up and
>> >> working.
>> >> >> >>>
>> >> >> >>> To do this, I was thinking of adding a comment to all 129 tasks
>> to
>> >> ask
>> >> >> >>> the watchers of each issue if it should be labelled "newbie".
>> This
>> >> >> >>> will send hundreds of emails, which is a bummer, but it seems to
>> me
>> >> >> >>> like the best way to track the discussions and decisions.
>> >> >> >>>
>> >> >> >>> What does everyone think?
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Henry Robinson
>> >> > Software Engineer
>> >> > Cloudera
>> >> > 415-994-6679
>> >>
>> >
>> >
>> >
>> > --
>> > Henry Robinson
>> > Software Engineer
>> > Cloudera
>> > 415-994-6679
>>
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679


Re: Bulk JIRA email, or how to find good newbie tasks

2016-10-17 Thread Marcel Kornacker
Please don't add that comment. :)

What's currently labelled ramp-up is often not a good newbie task (and
maybe not even a good ramp-up task). The best way to identify newbie
tasks is for a few senior engineers to sift through the ramp-up tasks
and pick out maybe a few dozen that truly qualify as newbie tasks.

I'm happy to help out with that when I get back.

On Mon, Oct 17, 2016 at 6:50 PM, Jim Apple  wrote:
> The Impala JIRA has 129 tasks that have no assignee, are still open,
> and are labelled ramp* (i.e. ramp-up, ramp-up-introductory, etc.).
>
> I'd like to find which of those tasks are good tasks for someone who
> is making their first Impala patch. I intend to promote those on one
> or more of : the blog, the twitter account, this list, the user list,
> helpwanted.apache.org, and so on.
>
> The tasks should be the kind of thing that someone won't need too much
> hand-holding on, once their have their dev environment up and working.
>
> To do this, I was thinking of adding a comment to all 129 tasks to ask
> the watchers of each issue if it should be labelled "newbie". This
> will send hundreds of emails, which is a bummer, but it seems to me
> like the best way to track the discussions and decisions.
>
> What does everyone think?


Multi-threading design

2016-09-20 Thread Marcel Kornacker
I've been thinking about a new approach to multi-threaded execution in
Impala and been collecting those thoughts in a design doc draft:

https://docs.google.com/document/d/1kpl7UB3DMhqtGmdjRNEc_JZvZ0_5oRNFe1SD_NFZVsc/edit?usp=sharing

Comments welcome.


kudu startup errors

2016-09-20 Thread Marcel Kornacker
Is anyone else seeing this with run-all.sh? I'm in a clean, up-to-date
tree. I also
just restarted ntp.

Starting kudu (Web UI - http://localhost:8051)
/home/marcel/impala/testdata/cluster/cdh5/node-1/etc/init.d/common:
line 35: 19501 Segmentation fault  (core dumped) "$CMD" "$@" &>
"$LOG_FILE"
Failed to start kudu-master. The end of the log
(/home/marcel/impala/testdata/cluster/cdh5/node-1/var/log/kudu-master.out)
is:
PC: @  0x188aff4 kudu::(anonymous namespace)::ShardedLRUCache::Lookup()
*** SIGSEGV (@0xb4994340) received by PID 19501 (TID 0x7fe789340700)
from PID 18446744072444527424; stack trace: ***
@ 0x7fe7aae57330 (unknown)
@  0x188aff4 kudu::(anonymous namespace)::ShardedLRUCache::Lookup()
@   0xa1dd31 kudu::codegen::CodeCache::Lookup()
@   0xa14f7c
kudu::codegen::CompilationManager::RequestRowProjector()
@   0x8b5907 kudu::tablet::(anonymous
namespace)::GenerateAppropriateProjector()
@   0x8b81fe kudu::tablet::MemRowSet::Iterator::Iterator()
@   0x8b8a13 kudu::tablet::MemRowSet::NewIterator()
@   0x8b8f2f kudu::tablet::MemRowSet::NewRowIterator()
@   0x8692b5 kudu::tablet::Tablet::CaptureConsistentIterators()
@   0x869d72 kudu::tablet::Tablet::Iterator::Init()
@   0x7a603e kudu::master::SysCatalogTable::VisitTables()
@   0x7b9beb kudu::master::CatalogManager::VisitTablesAndTablets()
@   0x7ba4ac
kudu::master::CatalogManager::VisitTablesAndTabletsTask()
@  0x1909e1e kudu::ThreadPool::DispatchThread()
@  0x1904b9a kudu::Thread::SuperviseThread()
@ 0x7fe7aae4f184 start_thread
@ 0x7fe7a9b1637d clone
@0x0 (unknown)
Error in /home/marcel/impala/testdata/bin/run-mini-dfs.sh at line 40:
$IMPALA_HOME/testdata/cluster/admin start_cluster
Error in testdata/bin/run-all.sh at line 42: tee
${IMPALA_CLUSTER_LOGS_DIR}/run-mini-dfs.log


Re: [DISCUSS] Removal of Llama support in next (2.8?) release of Impala

2016-09-18 Thread Marcel Kornacker
+1

On Sat, Sep 17, 2016 at 3:04 PM, Alex Behm  wrote:
> +1
>
> On Sat, Sep 17, 2016 at 2:52 PM, Henry Robinson  wrote:
>
>> Impala's support for Llama, which adapts Yarn for low-latency query
>> engines, has long since bitrotted, and never worked very well. I propose
>> removing it from the next version of Impala - I don't know of any
>> deployments relying on its use.
>>
>> The initial patch is here: http://gerrit.cloudera.org:8080/4445
>>
>> If there's any discussion to be had, let's do it here.
>>
>> Thanks!
>> Henry
>>


[Impala-ASF-CR] IMPALA-4098: Open()/Close() partition exprs once per fragment instance.

2016-09-09 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4098: Open()/Close() partition exprs once per fragment 
instance.
..


Patch Set 2: Code-Review+2

(2 comments)

so much simpler.

http://gerrit.cloudera.org:8080/#/c/4340/2//COMMIT_MSG
Commit Message:

Line 10: exec nodes (and/or a data sink) within the same fragment, so the 
lifecycle
do you mean fragment or fragment instance? please clarify. (and if you really 
mean fragment, explain.)


http://gerrit.cloudera.org:8080/#/c/4340/2/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 438:   Status PrepareAndOpenExprs(RuntimeState* state) const;
..PartitionExprs()? (or do you anticipate throwing more exprs into this?)


-- 
To view, visit http://gerrit.cloudera.org:8080/4340
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id179df645e500530f4418988f6ce64a03d669892
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
..


Patch Set 4:

(100 comments)

http://gerrit.cloudera.org:8080/#/c/4054/3//COMMIT_MSG
Commit Message:

> nit: would you mind wrapping the text in commit messages to 80 characters? 
Done


PS3, Line 10: oordinator for multi-threaded
> "at most one instance of each fragment" (there are already several fragment
Done


PS3, Line 13: eave
> typo
Done


PS3, Line 19: ring th
> fragment
Done


PS3, Line 19: ring th
> fragment
Done


PS3, Line 7: IMPALA-3902: Scheduler improvements for running multiple fragment 
instances on
   : a single backend
   : 
   : This is an extension of the scheduler and coordinator for 
multi-threaded
   : execution. It mainly removes the assumption of having one instance 
per
   : fragment per host. The approach taken here is to create parallel 
data
   : structures and control flow functions, where necessary, and 
otherwise to leave
   : the existing single-instance logic in place. The parallel 
structures' and
   : functions' names are prefixed with "Mt" to facilitate the 
enventual clean-up.
   : Not much of an attempt was made to factor out common functionality 
between
   : the Mt- and the single-threaded version, because the 
single-threaded version
   : will eventually disappear (once multi-threaded execution is the 
default)
   : and refactoring the existing code to fit into two parallel 
functions from
   : which it's being called might en
> nit: try and keep commit messages to 90 chars at most - otherwise they're v
Done


http://gerrit.cloudera.org:8080/#/c/4054/3/be/src/common/global-types.h
File be/src/common/global-types.h:

PS3, Line 30: int
> nit: say int32_t so the width is explicit.
we don't do that for the other ones.

what's the concern with not having the width be explicit?


http://gerrit.cloudera.org:8080/#/c/4054/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS3, Line 226:   /// same fragment, but this ID u
> It's getting kind of confusing with all the ids and indexes. It might be he
Done


PS3, Line 238: k ti
> why is this an 'id' but the other monotonically increasing *idx_ fields bel
it doesn't have to be. i asked that same question in Planner.thrift.


PS3, Line 244: Summed acr
> This name isn't very clear. Maybe coord_finstance_state_idx_,  coord_state_
once you stare at this code enough, the term 'state' becomes much more 
specific. :)

but i can rename it if it's still too vague.


PS3, Line 375: }
 : 
 : Status Coordinator::Exec(const QuerySchedule& schedule,
 : vector* output_expr_ctxs) {
 :   const TQueryExecRequest& request = schedule.request();
 :   DCHECK(request.fragments.size() > 0 || 
request.mt_plan_exec_info.size() > 0);
 :   needs_finalization_ = request.__isset.finalize_params;
 :   if (needs_finalization_) finalize_params_ = 
request.finalize_params;
 : 
 :   VLOG_QUERY << "Exec() query_id=" << schedule.query_id();
 :  
> I agree that this and subsequent functions should be in a utility class. Th
there are only two functions left for this utility class, which seems too 
meager. i moved those two into QuerySchedule.


PS3, Line 405: n
> space before
Done


Line 405:   // TODO: move initial setup into a separate function; right now 
part of it
> space before =
Done


PS3, Line 407:  instance state
> is_mt_exec for consistency above
Done


PS3, Line 408:   int32_t num_fragment_instances = schedule.GetTotalFInstanc
> can you add a comment about why this is necessary in the MT case only?
moved into QuerySchedule and added comment


PS3, Line 408: 2_t num_fragmen
> why is this only needed for MT?
Done


PS3, Line 443: T
> nit space
Done


PS3, Line 443: T
> space
Done


Line 443: MemTracker::GetQueryMemTracker(query_id_, query_limit, -1, 
pool_tracker, NULL);
> space before =
Done


PS3, Line 466: files_[coord_fragment.id], 0, obj_pool()));
> parentheses? Sometimes we have them in these cases.
i don't think this is unclear, so i'd say parentheses here don't add anything.


PS3, Line 560: lse {
 : SetExecPlanFragmentParams(
> this is a bit outdated since we know there is a coordinator fragment here, 
moved comment to where it makes sense


PS3, Line 651: UERY << "starting " << num_fragment_instances << " fragment 
instances for query "
 :  << query_id_;
 :   query

[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new patch set (#4).

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
..

IMPALA-3902: Scheduler improvements for running multiple fragment instances on
a single backend

This is an extension of the scheduler and coordinator for multi-threaded
execution. It mainly removes the assumption of having one instance per
fragment per host. The approach taken here is to create parallel data
structures and control flow functions, where necessary, and otherwise to leave
the existing single-instance logic in place. The parallel structures' and
functions' names are prefixed with "Mt" to facilitate the enventual clean-up.
Not much of an attempt was made to factor out common functionality between
the Mt- and the single-threaded version, because the single-threaded version
will eventually disappear (once multi-threaded execution is the default)
and refactoring the existing code to fit into two parallel functions from
which it's being called might end up obscuring the code more than helping it.
Also, this code is relatively stable and having two parallel paths won't
cause much extra work (in terms of having to apply the same changes/fixes
twice) in the medium term.

Changes to data structures:
- QuerySchedule: per-instance and per-fragment structs with complete
  execution parameters (instead of partially relying on TQueryExecRequest);
  the per-instance execution parameter struct is a child of the per-fragment
  parameter struct
- explicit fragment id, with range 0..#fragments-1 (instead of relying on an
  index into an array in TQueryExecRequest)

Excluded:
- runtime filter handling
- anything related to RM

Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
---
M be/CMakeLists.txt
M be/src/common/global-types.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/scheduling/query-resource-mgr.cc
M be/src/scheduling/query-resource-mgr.h
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/scheduling/simple-scheduler.cc
M be/src/scheduling/simple-scheduler.h
M be/src/service/fragment-exec-state.cc
M be/src/service/fragment-mgr.cc
M be/src/service/impala-server.cc
M be/src/service/query-exec-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/CMakeLists.txt
M be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M common/thrift/ExecStats.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Planner.thrift
M common/thrift/Types.thrift
M fe/src/main/java/com/cloudera/impala/common/TreeNode.java
M fe/src/main/java/com/cloudera/impala/planner/Planner.java
M fe/src/main/java/com/cloudera/impala/service/Frontend.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
29 files changed, 1,488 insertions(+), 546 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/4054/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3905: Add single-threaded scan node.

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3905: Add single-threaded scan node.
..


Patch Set 7: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 304:   bool unknown_disk_id_warned_;
> Done
?


http://gerrit.cloudera.org:8080/#/c/4113/7/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

Line 104: boost::lock_guard l(file_type_counts_);
move to .cc, this isn't perf-critical and doesn't need to be inlined (it's nice 
to keep extra fluff out of headers)


Line 141:   /// This lock cannot be taken together with any other lock except 
lock_.
what is required lock order?


-- 
To view, visit http://gerrit.cloudera.org:8080/4113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node-test.cc
File be/src/exec/kudu-scan-node-test.cc:

Line 53: class KuduScanNodeTest : public testing::Test {
> I can't say since I didn't add these to begin with, but while we're in the 
good idea, let's leave a todo.

the reason i don't like white-box testing is that you end up writing a lot of 
test harness code, which can have bugs, and you might end up  testing paths 
that aren't taken during actual query execution.


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 114:   if (batch_done) break;
> To avoid returning Status::OK() again, and in case logic gets added after t
no, please leave as is.


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java:

Line 33: public class KuduPlannerTest extends PlannerTestBase {
> I don't know what the original reason was, but now that it's separate it's 
is this very slow?

the planner tests typically take something like 40s to run, a few seconds on 
top won't make a difference. also, it's good to have everything in one place, 
so i don't need to remember to run two planner tests when making planner 
changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
> you're not going to avoid a round trip to the kudu master to get tablet loc
those can all be addressed separately, but i don't see a reason to ditch 
functionality we already have. it's always harder to put it back later. and it 
would be a special case for kudu anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
> when you "open" a table, it fetches the metadata (columns, etc) from the ma
one of our design principles is to avoid round-trips to central servers during 
plan generation. (i believe your test cluster numbers, but i also believe it's 
not that hard to create an adversarial workload.)

let's remove this todo.


Line 139: tableSchema.getColumn(colName);
> We're more or less following the pattern of JDBC where trying to get an inv
or why not simply have getColumn() return null?

we have no special allegiance to jdbc.


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3481: Use Kudu ScanToken API for scan ranges

2016-09-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3481: Use Kudu ScanToken API for scan ranges
..


Patch Set 4:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node-test.cc
File be/src/exec/kudu-scan-node-test.cc:

Line 53: class KuduScanNodeTest : public testing::Test {
we don't have unit tests for any other exec node. why is this an exception?


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 247: VLOG(1) << "Thread started: " << name;
1 is too chatty. also, please use the macros.


Line 328:   VLOG(1) << "Thread done: " << name;
change


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

Line 39: /// are used to retreive the rows for this scan.
retrieve


Line 120:   void ScannerThread(const string& name, const string* initial_token);
rename to some verb


Line 125:   Status ProcessScanToken(KuduScanner* scanner, const string* 
scan_token);
why const*? can this be null?


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 114:   if (batch_done) break;
why break instead of return?


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/exec/kudu-scanner.h
File be/src/exec/kudu-scanner.h:

Line 34: /// Wraps a Kudu client scanner to fetches row batches from Kudu. The 
Kudu client scanner
to fetch


Line 114:   int cur_kudu_batch_num_scanned_;
explain


http://gerrit.cloudera.org:8080/#/c/4120/4/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 976: scan_range_length = 1000;
can't the fe get that out of the scan token?


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/KuduScanNode.java:

Line 107: try (KuduClient client = new KuduClientBuilder(
break before 'new' instead


Line 131:* TODO: Remove when the columns are fetched from Kudu directly 
during analysis.
does kudu cache all metadata? where would the column info get fetched from?


Line 139: tableSchema.getColumn(colName);
note to kudu api designers: it's nice not to use exceptions to signal return 
values.

is that a sufficient test? what about data types?


Line 165:   // TODO: Consider only using the LEADER replica.
for better cache locality?


Line 167: TNetworkAddress address = new 
TNetworkAddress(replica.getRpcHost(),
break before new instead


Line 203: for (KuduPredicate predicate: kuduPredicates_) {
single line


Line 267:* bounds of the result can be evaluated with Expr::GetValue(NULL)).
this presumable has some side effects. describe them.


Line 295: // Cannot push prediates with null literal values
is there a kudu jira for this?


Line 349:   private static ComparisonOp 
getKuduOperator(BinaryPredicate.Operator op) {
use qualified name for ComparisonOp, easier to identify that it's a "foreign" 
type that way


Line 351: case GT: return ComparisonOp.GREATER;
indentation


http://gerrit.cloudera.org:8080/#/c/4120/4/fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/KuduPlannerTest.java:

Line 33: public class KuduPlannerTest extends PlannerTestBase {
could we merge this back into the regular planner test? what was the reason for 
separating this?


-- 
To view, visit http://gerrit.cloudera.org:8080/4120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: Add single-threaded scan node.

2016-09-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3905: Add single-threaded scan node.
..


Patch Set 6:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 287:   if (state->query_options().mt_num_cores > 1) {
i thought we wanted to have this be >= 1 (and change the default to 0)


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 154: Status HdfsScanNodeBase::Prepare(RuntimeState* state) {
this is insanely long. leave todo: break this up.


Line 373:   for (auto& filter: filter_ctxs_) 
RETURN_IF_ERROR(filter.expr->Open(state));
don't use auto here


Line 682: /// This function may be called from multiple threads, and we 
expect each
this also sounds like it's specific to the old scan node.


Line 763: lock_guard l(file_type_counts_lock_);
this seems specific to having multiple scanner threads.

it would be good to outline the eventual clean-up, when scanner threads go 
away. either with a central todo a the top (of the .h file) that lists 
everything, or with todos here.


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 95: /// into a roe batch queue (via HdfsScanner::ProcessSplit()). Those 
specifics
RowBatch


Line 125:   virtual bool IsMultiThreaded() const = 0;
can we pick a different name for that? i'm worried about causing confusion 
(with other mt-related things, and this is specifically not relevant in the mt 
path).

something related to using a rowbatch queue?


Line 216:   /// Called by scanners when a range is complete.  Used to trigger 
done_ and
done_ has migrated into the old scanner, and you explicitly reference a 
rowbatch queue. update/reword comment.


Line 242:   typedef boost::unordered_map> 
PerVolumnStats;
std::


Line 266:   const std::vector filter_ctxs() const { return 
filter_ctxs_; }
const& ?


Line 304:   boost::unordered_set partition_ids_;
std::


Line 405:   SpinLock file_type_counts_lock_;
isn't this specific to having multiple scanner threads?


Line 447:   /// This must be called on Close() to unregister counters.
mention synchronization requirement?


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-mt.cc
File be/src/exec/hdfs-scan-node-mt.cc:

Line 73: // TODO: This is probably not worth splitting the organisational 
cost of splitting
intentional duplication?


Line 74: // initialisation across two places. Move to before the scanner 
threads start.
reference to scanner threads


Line 84: RETURN_IF_ERROR(runtime_state_->io_mgr()->GetNextRange(
single line? if not, break before call/after macro?


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 716
where did this go?


Line 92: RETURN_IF_ERROR(IssueInitialScanRanges(state));
much better


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

Line 44: /// HDFS-serialised data.
mention that this is not used on the non-mt path.


Line 61: /// recycled.
todo: remove once mt is fully functional


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

Line 116:   /// including the final one in Close().
this comment seems unrelated to the c'tor.


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

Line 299:   /// Always returns false if the scan_node_ is not multi-threaded.
this also deserves a clean-up todo.


-- 
To view, visit http://gerrit.cloudera.org:8080/4113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4014: PARALLEL HEADERS ONLY: Introduce query-wide execution state.

2016-09-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4014: PARALLEL HEADERS ONLY: Introduce query-wide 
execution state.
..


Patch Set 1:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

PS1, Line 39: fragment
> finstance
in comments, please spell it out (as fragment instance)


Line 62:   exec_params_(params) {
do we need to copy these? that's a fairly large struct. you already get them in 
Prepare(), maybe that's enough (or alternatively, pass them again into Open())


http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 38: /// This object is created once the first fragment instance for a 
query arrives to the
arrives at

don't comment on the qem here. transfer those comments to the qem, if that's 
needed.


Line 41: /// The lifetime of this class is maintained by using an explicit 
reference count that is
'is dictated by an explicit ...' or 'is guided by an explicit ...'


Line 67:   /// fragment instances arrive in a single RPC.
do we need this at all right now?


Line 73:   /// Delete all the FInstanceState objects. This is called when the 
QueryState
'delete all state'. (no need to list that state, in particular since what that 
is will be changing soon.)

don't comment on the caller.

*do* comment on externally observable behavior (i'm assuming it's not legal to 
call any other function of this class  after this call).


PS1, Line 79: RegisterFInstance
> Should this be renamed to include the fact that it does more than Register 
unless registration is a concept that shows up elsewhere in this class, no need 
to mention it here (esp. not in the name).

i'd also avoid 'schedule', we already use that verb in a different context (and 
i'm not sure what else we're trying to express other than 'execute' or 'start 
fragment instance').


Line 81:   /// Cancels a plan fragment that is running asynchronously.
remove references to plan fragments, unless that's what you mean.


Line 112:   int32_t num_current_references_;
why is num_active_finstances_ atomic, but not this?


Line 118:   /// by us and its lifetime lasts as long as the QueryState that 
owns it.
we are that qs.


Line 129:   FInstanceState* GetFInstanceState(
single line


http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/runtime/runtime-state-new.h
File be/src/runtime/runtime-state-new.h:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
could you change runtime-state.h instead (and move the content into a new file 
before check-in)? it's hard to see the diff.


http://gerrit.cloudera.org:8080/#/c/4301/1/be/src/service/query-exec-mgr.h
File be/src/service/query-exec-mgr.h:

Line 40: /// offer itself up for destruction when its reference count (i.e. 
number of references
drop explanation of 'reference count'.


Line 56:   /// Receives a remote fragment instance and creates or uses (if it 
already exists) a
remove "receives a remote fragment instance" (what does that mean?)


Line 58:   /// instance to it to be executed on a separate thread.
should this really be called 'StartFInstance'?


Line 61:   /// was registered (like low memory). Otherwise, returns OK, which 
guarantees that the
you talk about the fragment being 'registered', but it's unclear what that 
means (because there are no related functions to that in this class - this 
isn't a concept visible to the caller of this class)


Line 74:   /// holds on to it. This is ensured by increasing the internal 
reference count of the
don't explain implementation here.

but: it is a requirement that every caller of GetQueryState() eventually calls 
ReleaseQueryState()


Line 84:   Status CancelFInstance(const TUniqueId& finstanceance_id);
param spelling


Line 102:   class Guard {
move this into QueryState. also, this needs to be public.


Line 140:   /// Runs in the fragment instance' execution thread. Calls 
ReleaseQueryState() to
don't comment on where this runs (the caller is in charge of that)


PS1, Line 138: /// Calls finstance_state->Prepare() and then 
finstance_state->state->Exec().
 :   /// The finstance_state must previously have been registered.
 :   /// Runs in the fragment instance' execution thread. Calls 
ReleaseQueryState() to
 :   /// decrease the refcount to the QueryState, which was 
reserved during fragment instance
 :   /// registration time.
 :   void ExecInstance(QueryState* query_state, FInstanceState* 
finstance_state);
> Should we move the whole method to the QueryState or even FInstanceState? I
this is presumably just a wrapp

[Impala-ASF-CR] IMPALA-4053: Address follow up comments for IMPALA-3610

2016-09-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4053: Address follow up comments for IMPALA-3610
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/4306/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 374
what's the rationale for leaving this here?


Line 264: /// it is disabled.
more interesting: how is the discarded or disabled state recorded in this 
struct?


Line 277:   vector* targets() { return &targets_; }
FilterTarget isn't referenced in the .h file, also move it here


Line 290:   /// Discards a filter as we will use it no more. A discarded filter 
consumes no memory.
don't comment on the intention of the caller. describe the externally visible 
behavior of this call instead.


Line 295: MemTracker* filter_mem_tracker, TPublishFilterParams* rpc_params);
describe externally visible behavior and role of parameters. just looking at 
the signature and the comment doesn't tell me what's going on.


Line 2075: 
remove blank line


Line 2077: 
and this. this section of code deals with applying updates, so it's a good idea 
to group it


Line 2082: for (FilterTarget target: *state->targets()) {
const FilterTarget&

since you're not updating the targets here, you can also add another accessor

const vector& targets() const { return ...}


Line 2090: state->AssignOutgoingFilter(params, filter_mem_tracker_.get(), 
rpc_params.get());
i find the control flow harder to follow than before, and the code has more 
special cases now. what was the reason for this change?


-- 
To view, visit http://gerrit.cloudera.org:8080/4306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7499558cc9d3622a9f3b68904aa5fe92a113f579
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


Re: Request for feedback: C++ Style Guide

2016-09-06 Thread Marcel Kornacker
Who wrote the original document? There are several items on there that
are the exact opposite of what we do (basically everything in the
'Classes' section).

I would prefer to remove that entire section. Anyone against that?

On Fri, Sep 2, 2016 at 9:59 AM, Tim Armstrong  wrote:
> Yes, several things are completely wrong. E.g. we never use c-style casts.
>
> On Fri, Sep 2, 2016 at 9:38 AM, Jim Apple  wrote:
>
>> I left a comment on the page - I'm not sure how much these reflect our
>> actual current practice.
>>
>> On Fri, Sep 2, 2016 at 9:36 AM, Lars Volker  wrote:
>> > After some confusion in reviews about how to format code I moved our
>> > internal C++ Style Guide wiki page to the Apache wiki and updated all
>> links
>> > in it. You can find it here:
>> >
>> > https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=65868536
>> >
>> > At some point in time someone seems to have started a list of pro and
>> cons,
>> > some of which are worded rather negative. Do we want to revisit those
>> > comments or the style guide even?
>> >
>> > I'm looking forward to any feedback. Thanks, Lars
>>


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-09-01 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4066/11/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 380: // TODO-MT: remove
> I guess child_trackers_locks_ in the process mem tracker may prevent this r
the QueryState tear-down will not happen until the refcount is 0, and the 
refcount won't be 0 if anyone still has a reference to anything reachable from 
QueryState. the goal is to simplify the tear-down dramatically and get away 
from all implicitly-managed memory.

please restore the todo.


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4020: Handle external conflicting changes to HMS gracefully

2016-08-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4020: Handle external conflicting changes to HMS 
gracefully
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4161
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic228efbcceb9ef6c165d0d9aeef7202581e3e46a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 11: Code-Review+2

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4066/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 2110: DCHECK_EQ(state->bloom_filter(), 
reinterpret_cast(NULL));
move this todo


http://gerrit.cloudera.org:8080/#/c/4066/11/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 281: 
you can remove all blank lines between the getters


Line 380: // TODO-MT: remove
> Can you remove this comment?
there shouldn't be references to any query-related state when the QueryState 
refcount goes to 0.

why would this still be necessary?


Line 2098:   // Complete partitioned or broadcast join filter case.
"complete filter case"


Line 2133:   //if (desc_.is_broadcast_join) return;
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 10:

(5 comments)

looks good, but i'd still like to see the filter swapping cleaned up.

http://gerrit.cloudera.org:8080/#/c/4066/9/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 718: row.push_back(lexical_cast(v.first));
> then why not remove the 'const' in the for?
?


Line 2048: state->Disable(filter_mem_tracker_.get());
> this is an iterator, no?
you don't need to create refs to iterators


Line 2107: // cost. After this point, params.bloom_filter is an empty 
filter and should not be
> that's right, and FilterState has all the information it needs to do this c
?


http://gerrit.cloudera.org:8080/#/c/4066/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 262: /// A filter may be disabled  if an always_true filter update is 
received, an OOM is hit,
"may be" is too vague; "is"? (what else may happen?)


Line 332:   /// True if the filter is permanently disabled for this query due 
to being disabled or
circular definition ("is disabled due to being disabled"). you already define 
the semantics of 'disabled' in the class comment, no need to repeat.


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


Re: scoped_ptr -> unique_ptr?

2016-08-31 Thread Marcel Kornacker
Generally, we want to move away from implicitly-managed memory (d'tor
frees) to explicitly, centrally managed memory (in MemPools, and
d'tors become no-ops).

With that in mind, I would limit the amount of effort spent on
changing from one implicitly-managed model to another.

That said, I'm in favor of doing the scoped_/unique_ptr replacement in
the current codebase where it is necessary to move a reference. I'm
not sure where that would currently apply.

On Wed, Aug 31, 2016 at 11:30 AM, Tim Armstrong  wrote:
> I don't feel that strongly about it but, but I think scoped_ptr/unique_ptr
> is a useful distinction to document that the pointer isn't movable.
>
> I think it contains info both ways: if it's a unique_ptr it means we intend
> to (sometimes) transfer it, but if it's a scoped_ptr its lifetime is
> strictly bounded by the owner.
>
> I agree it's hard to accidentally transfer something but I think if we use
> the same type it's much easier to violate the implicit memory lifetime
> assumptions in a bit of code without warning.
>
> E.g. if you have class Parent that contains a scoped_ptr child_,
> then it's always safe for child_ to store a Parent* reference. But if you
> change it to unique_ptr then it's no longer obvious that this is
> safe without auditing references to child_ to make sure it isn't release()d
> or moved.
>
> On Wed, Aug 31, 2016 at 11:24 AM, Jim Apple  wrote:
>
>> I'm convinced. +1 to moving to ::std::unique_ptr.
>>
>> On Wed, Aug 31, 2016 at 11:22 AM, Henry Robinson 
>> wrote:
>> > On 31 August 2016 at 11:16, Jim Apple  wrote:
>> >
>> >> Why should we reduce our boost dependency?
>> >>
>> >
>> > Boost will sometimes break subtly (or unsubtly by changing APIs) between
>> > versions, is often not as well tested as stdlib implementations and does
>> > not have a standard. If there are reasonable std:: implementations of
>> > boost:: primitives, experience has shown it's usually a good idea to opt
>> > for the std
>> >
>> >
>> >>
>> >> Do you think there are places where scoped_ptr is used now where you
>> >> would want to keep it indefinitely if it were part of the standard and
>> >> not part of boost?
>> >>
>> >
>> > No, but I can't say I've audited every location. For our typical uses, I
>> > don't see a disadvantage to unique_ptr.
>> >
>> >
>> >
>> >>
>> >> On Wed, Aug 31, 2016 at 10:47 AM, Henry Robinson 
>> wrote:
>> >> > We use boost::scoped_ptr everywhere to handle scope-owned
>> heap-allocated
>> >> > objects. C++11 has std::unique_ptr for this. I'd like to get a
>> decision
>> >> on
>> >> > whether we should start standardising on unique_ptr. This is
>> particularly
>> >> > relevant for new code - should I call it out in code review?
>> >> >
>> >> > The most significant difference is that unique_ptr is moveable, which
>> >> means
>> >> > it can be used in collections (good!). It also means that badly
>> written
>> >> > code can allow scope-owned objects to escape their scope:
>> >> >
>> >> > private:
>> >> >   unique_ptr foo_;
>> >> >
>> >> > public:
>> >> >   unique_ptr get_foo() { return move(foo_); }
>> >> >
>> >> > or worse:
>> >> >
>> >> >   Foo* get_foo() { return foo_.release(); }
>> >> >
>> >> > In both cases you have to be quite explicit about the decision to
>> yield
>> >> > ownership of the owned object, and it seems to me that we should catch
>> >> this
>> >> > in code review.
>> >> >
>> >> > Since using unique_ptr in collections is so useful, and reducing our
>> >> boost
>> >> > dependency is generally worthwhile, I'm very much in favour of moving
>> to
>> >> > unique_ptr for future code, and at some point porting all the current
>> >> > scoped_ptr to unique_ptr.
>> >> >
>> >> > What do you think?
>> >> >
>> >> > Henry
>> >>
>> >
>> >
>> >
>> > --
>> > Henry Robinson
>> > Software Engineer
>> > Cloudera
>> > 415-994-6679
>>


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4066/9/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 718: // Cast the const away to read non-const member functions. No 
modification is done.
> I can't call non-const functions otherwise, specifically state.targets() an
then why not remove the 'const' in the for?


Line 2048:   for (auto& filter: filter_routing_table_) {
> I don't understad, versus?
this is an iterator, no?


Line 2107: state->set_completion_time(query_events_->ElapsedTime());
> This has to be set only once we are sure that we aren't going to wait for a
that's right, and FilterState has all the information it needs to do this 
correctly.


Line 2119: if (state->desc().is_broadcast_join) {
> That would cause broadcast filters having to swap() twice, which is two tim
swap() means there's no copy (for the string field).


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 9:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4066/9/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 262: /// A filter may be discarded if an always_true filter update is 
received, an OOM is hit,
you use both disabled and discarded to mean the same thing (i think). let's 
switch to disabled, because discarded sounds like it's not there anymore. 
rename the function to Disable(). mention that a disabled filter consumes no 
memory.


Line 303:   /// Resets bloom_filter_ to NULL, sets pending_count_ to 0, 
releases any memory
don't reference internal state in the comment of a public function (the reader 
should be able to make use of this class w/o understanding the internal state).

you already define what you mean by disabled in the class comment.


Line 390: // comprehensive logging of all mem-trackers.
add TODO-MT: remove


Line 718: // Cast the const away to read non-const member functions. No 
modification is done.
why do you need non-const access?


Line 726: for (const auto& target: *state.targets()) {
no auto


Line 2048:   for (auto& filter: filter_routing_table_) {
why make this a ref?


Line 2086: DCHECK_GT(state->pending_count(), 0);
from l2086 to l2090 can also go into ApplyUpdate()


Line 2099: // If the filter was discarded, we should send out an 
always_true filter immediately.
this comment doesn't describe the following statement


Line 2105: // No more filters are pending on this filter ID. Create a 
distribution payload and
"no more updates are..."


Line 2107: state->set_completion_time(query_events_->ElapsedTime());
move into ApplyUpdate


Line 2119: if (state->desc().is_broadcast_join) {
also swap in the filter in ApplyUpdate for broadcast joins, then you don't have 
to differentiate here. also, it makes the filter state more regular and easier 
to comprehend.


Line 2183:   pending_count_ = 0L;
does the pending count still matter at this point (and if so, why?)


http://gerrit.cloudera.org:8080/#/c/4066/9/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

Line 277:   BloomFilter::ToThrift(&bf3, &bf3_thrift);  
trailing spaces


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3930,IMPALA-2570: Fix shuffle insert hint with constant partition exprs.

2016-08-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3930,IMPALA-2570: Fix shuffle insert hint with constant 
partition exprs.
..


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4162/5/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 241:   partition = DataPartition.hashPartitioned(partitionExprs);
leave todo in hashPartitioned() to checkstate that the expr list isn't empty 
(don't add it, it might break some stuff :))


-- 
To view, visit http://gerrit.cloudera.org:8080/4162
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1084d49c95b7d867eeac3297fd2016daff0ab687
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3930,IMPALA-2570: Fix shuffle insert hint with constant partition exprs.

2016-08-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3930,IMPALA-2570: Fix shuffle insert hint with constant 
partition exprs.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4162/4/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 197: if (insertStmt.hasNoShuffleHint()) return inputFragment;
i find this easier to read than the 3-valued "flag" from before


Line 249: DataPartition partition = 
DataPartition.hashPartitioned(partitionExprs);
change this c'tor to return an unpartitioned DataPartition if the expr list is 
empty?


-- 
To view, visit http://gerrit.cloudera.org:8080/4162
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1084d49c95b7d867eeac3297fd2016daff0ab687
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.

2016-08-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4014: Introduce query-wide execution state.
..


Patch Set 11:

> From the discussion today:
 > 
 > Points agreed on:
 > - Collapse Executors and states, i.e. collapse FragmentInstanceExecutor
 > into FragmentInstanceState, so that responsibilities and boundaries
 > of classes are clear.
 > 
 > - QueryExecMgr is still responsible for setting up the QueryState
 > and creating the fragment threads for now. Once the per-query RPC
 > is implemented, the model will change such that QueryExecMgr
 > creates a “query" thread which sets up the QueryState and starts
 > the fragment threads.

Small correction: once we have the per-query rpc we'll still have QEM create 
the QueryState and register it before starting the query thread (which then 
starts the fragment instance threads).

 > 
 > Additionally, I'll put out a separate patch with *only* headers
 > that are parallel with the existing headers, so that Lars can use
 > that to work on the per-query RPC.

-- 
To view, visit http://gerrit.cloudera.org:8080/3817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..


Patch Set 1: Code-Review+2

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4166/1/fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java:

Line 359:* Also computes totalBytes_
, totalFiles_, hasCorruptTableStats_


Line 370:   if ((cardinality_ < -1) || (cardinality_ == 0 && 
tbl_.getTotalHdfsBytes() > 0)) {
remove () from (cardinality_ < -1) - i find that easier to read because it 
requires less parenthesis counting


Line 374: totalFiles_ += partitions_.get(0).getFileDescriptors().size();
another checkstate that we only have a single partition?


Line 377: // Nothing to scan. Definitely a cardinality of 0 even if we 
have no stats.
make this branch the first one to avoid negation


Line 414: if (!(cardinality_ >= 0 || cardinality_ == -1)) {
a bit easier: if (cardinality_ < -1)


http://gerrit.cloudera.org:8080/#/c/4166/1/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
File testdata/workloads/functional-planner/queries/PlannerTest/hbase.test:

Line 495: 04:HASH JOIN [INNER JOIN]
> Plan looks better, but I'm still double checking whether something is wrong
that generally seems to be the case as often as it is not. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-29 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4066/5/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 284:   /// coarsley for every filter update.
remove "coarsley" (sic)


Line 289:   // Apply an update to the aggregated filter with the received 
filter.
describe role of params


Line 292:   TPublishFilterParams* rpc_params, const TUniqueId& query_id);
in/out params go at end


Line 343:   // logging.
what do the scanner threads do with this mem tracker?


Line 2102: TPublishFilterParams* rpc_params, const TUniqueId& query_id) {
pass in Coordinator* instead of the mem tracker and query_id, that's more 
compact.


Line 2103:   filter_updates_received->Add(1);
move this out into the caller, there is no other use of this parameter (and it 
doesn't do anything to the filter state anyway).


Line 2105:   if (!desc.is_broadcast_join) {
if (is_broadcast_join) return;


Line 2115: disabled = true;
why don't you move this into Discard and then check that explicitly in 
UpdateFilter()? that would have made the bug more obvious.


Line 2131:   // 'bloom_filter' will process subsequent updates otherwise.
the reason this wasn't apparent is that it wasn't stated what a 
discarded/invalid filter state looks like. augment the struct definition (is 
there a reason why you'd want to distinguish between pending count == 0 and 
invalid? if so, state it.)


http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 82:   void Or(const BloomFilter& other);
> No it isn't. Only tests use it now.
in that case, please remove it.


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-29 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS4, Line 474: filter_mem_tracker_
> An issue here is that if there exists a coordinator fragment, this filter_m
all of the memory allocated for filters is specific to the query for which the 
filters are created.

where is the runtime state's mem tracker destroyed?


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3930,IMPALA-2570: Fix shuffle insert hint with constant partition exprs.

2016-08-29 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3930,IMPALA-2570: Fix shuffle insert hint with constant 
partition exprs.
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4162/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 199: // do nothing if the input fragment is already appropriately 
partitioned
expand comment: what happens if partitionExprs contains constants (why do you 
need to take them into account?)


Line 231: Preconditions.checkState(partitionHint == null || partitionHint);
partitionHint doesn't describe the meaning. repartitionHint?


Line 232: if (partitionExprs.isEmpty()) {
can't you short-circuit this path and move it right below l206? are there other 
hint-related corner cases that need to be excluded by the if stmt in l209?


-- 
To view, visit http://gerrit.cloudera.org:8080/4162
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1084d49c95b7d867eeac3297fd2016daff0ab687
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.

2016-08-29 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4014: Introduce query-wide execution state.
..


Patch Set 11:

(43 comments)

this needs a follow-on discussion regarding the existing class hierarchy (and 
what it should turn into).

we should disentangle that before adding the per-query rpc functionality.

http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/fragment-instance-exec-state.h
File be/src/runtime/fragment-instance-exec-state.h:

Line 39:   executor_(query_exec_state, exec_env, lambda(
please use a regular c++ lambda


Line 54:   Status Prepare();
yet more indirection.

the division of labor between this class and FragmentInstanceExecutor is 
unclear (who does the execution). i find the control flow very hard to follow.

here you have to go through the state to get to the executor. and the state in 
here is basically all static (exec params) aside from things like status, which 
are really execution-related.


Line 77:   /// Contains the context for the current fragment instance.
in general: separate w/ blank line if you have member variables with comments

this particular member is redundant, because it's already contained in 
exec_params_


Line 84:   boost::scoped_ptr exec_thread_;
unused?

if not: who is doing the execution then (who has the execution logic), this 
class or InstanceExecutor? this seems convoluted.


http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/fragment-instance-executor.cc
File be/src/runtime/fragment-instance-executor.cc:

Line 113: cgroup = 
ExecEnv::GetInstance()->cgroups_mgr()->UniqueIdToCgroup(PrintId(query_id_, 
"_"));
long lines (here and elsewhere)


Line 336: &FragmentInstanceExecutor::ReportProfile, this));
indent 2 spaces for the 2nd level


http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/fragment-instance-executor.h
File be/src/runtime/fragment-instance-executor.h:

Line 75:   /// TODO: Made reports per-query vs per fragment-instance.
make


Line 76:   typedef boost::function<
std::


Line 82:   FragmentInstanceExecutor(QueryState* query_exec_state, ExecEnv* 
exec_env,
might as well also call the param query_state.

drop exec_env, you can call GetInstance()


Line 145:   QueryState* query_exec_state() { return query_exec_state_; }
now that we're doing a global rename, let's drop 'state' uniformly


http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/query-exec-state.cc
File be/src/runtime/query-exec-state.cc:

Line 64:   CreateFragmentInstanceState(exec_params);
does this get called from anywhere else? if not, inline here and get rid of 
function. there is too much indirection here, and no clear separation of 
responsibilities (in the current code Create does the registration, not 
Register).


Line 82: fragment_inst_exec_state_map_.insert(make_pair(GetInstanceIdx(
change line breaks to improve readability (don't break up a function call if 
you don't have to)


Line 106: Status QueryState::CancelPlanFragment(const TUniqueId& 
fragment_instance_id) {
this and the next function are simply wrapper around FIS member functions, it 
looks like you only need GetFIS and the caller can do the rest. this will make 
the code easier to follow because it removes a level of indirection


Line 109:   if (fragment_inst_exec_state.get() == NULL) {
why not move this into GetInstanceState()?


http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/query-exec-state.h
File be/src/runtime/query-exec-state.h:

Line 57:   // TODO: Remove once we refcator code in coordinator.
spelling


Line 75:   void CleanFragmentInstanceStates();
clear?

also, shouldn't this be a general TearDown()?


Line 82:   FragmentInstanceState* CreateFragmentInstanceState(
does this need to be public, in addition to the preceding function? who calls 
this?


Line 114:   /// decrement this to 0 and clean up the query exec state.
move this comment to the actor class (qem).

rephrase (it's the last one to dereference this that will remove the state) if 
still useful.


Line 115:   int32_t num_current_references_;
not atomic? (or does it even make a difference?)


Line 118:   SpinLock fragment_inst_exec_state_map_lock_;
i started using finstance instead of frag_inst or fragment_inst or frag_instance


http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

Line 71:   QueryState* query_state = NULL);
pass in FIS (which contains a pointer to the query state as well as the exec 
params)


http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/service/client-request-exec-state.h
File be/src/service/client-request-exec-state.h:

Line 47: /// Execution state of a query and also captures all state required 
for servicing
this is limited to client-related state. differentiate more clearly against 
QueryState.


Line 57: 

[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.

2016-08-28 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4014: Introduce query-wide execution state.
..


Patch Set 11:

(2 comments)

As discussed in person earlier, we decided the best way forward would be to 
split this into header changes + implementation (the header changes will allow 
Lars to start working on the per-query rpc independently).

We also decided to introduce parallel headers to the existing ones (so we'll 
have FragmentInstanceState and FragmentExecState).

http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/fragment-instance-exec-state.h
File be/src/runtime/fragment-instance-exec-state.h:

Line 18: #ifndef IMPALA_SERVICE_FRAGMENT_EXEC_STATE_H
rename to fragment-instance-state.{cc,h}


http://gerrit.cloudera.org:8080/#/c/3817/11/be/src/runtime/query-exec-state.h
File be/src/runtime/query-exec-state.h:

Line 18: #ifndef IMPALA_RUNTIME_QUERY_EXEC_STATE_H
rename to query-state.{cc,h}


-- 
To view, visit http://gerrit.cloudera.org:8080/3817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-27 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4066/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 431:   filter_mem_tracker_.reset(new MemTracker(-1, -1, "Runtime Filter 
(Coordinator)",
> The query_mem_tracker() is set only by L425 based on whether there exists a
never mind, it's a descendent of the pool tracker.


Line 2040:   swap(rpc_params->bloom_filter, *filter);
> Done.
initialize state->bloom_filter to an empty struct, then do the swap.


http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 2010:   if (state->bloom_filter.get() == NULL) {
this is a very coarse lock. most of the code after l2037 is specific to state 
and doesn't need a global lock.

leave todo in FilterState to add lock there and reduce holding time of 
filter_lock_.


Line 2037:   // read.
almost everything from here to l2056 updates 'state'. move into ApplyUpdate().


Line 2051:   for (const auto& target_idx: target_fragment_instance_idxs) {
swap instead


Line 2057: };
follow with blank line, there is a logical break here (first you updated state, 
now you compute payload)


Line 2108
why did you move this assignment?

if it's necessary for correctness, explain that with a brief comment.


http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 104:   static const double ik = 1.0 / BUCKET_WORDS;
single line?


http://gerrit.cloudera.org:8080/#/c/4066/4/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 82:   void Or(const BloomFilter& other);
is this still getting called?


Line 106:  private:
no need to repeat param types in function name.

missing function comment.

group with other Or


http://gerrit.cloudera.org:8080/#/c/4066/4/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test:

Line 85
can't this run with a smaller limit?


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-3823: Add timer to measure Parquet footer reads

2016-08-26 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 5:

what's going on with this?

-- 
To view, visit http://gerrit.cloudera.org:8080/3576
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost

2016-08-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash 
table build cost
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


Re: [VOTE] Bylaw change to make branch creation or deletion lazy consensus

2016-08-25 Thread Marcel Kornacker
+1 (binding)

On Thu, Aug 25, 2016 at 1:48 PM, Jim Apple  wrote:
> Oh, and: following the bylaws, voting will be open for 72 hours.
>
> On Thu, Aug 25, 2016 at 1:15 PM, Jim Apple  wrote:
>> http://gerrit.cloudera.org:8080/4053
>>
>> I am proposing we change the bylaws so that a branch can be created or
>> deleted by lazy consensus of the PMC: "Lazy consensus requires no -1
>> votes ('silence gives assent')"
>>
>> Right now the bylaws do not say what kind of vote is required for a
>> branch creation or deletion.
>>
>> This is a change to the bylaws, and so requires lazy majority (3
>> binding +1 votes and more binding +1 votes than -1 votes) from PMC
>> members before it can be committed.
>>
>> I started a [DISCUSS] thread last week that had only one response.
>>
>> If you are on the PMC, your emailed vote on this thread (+1 or -1)
>> would be good. Branch creation is part of my plan for working on our
>> first release candidate.plus or minus


[Impala-ASF-CR] IMPALA-1659: Netezza compatibility functions: metadata

2016-08-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1659: Netezza compatibility functions: metadata
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b5d1009bbf42acc175a942d2df484e1c64822ca
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zoltan Ivanfi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3394: Add tests, make BackendConfig own class, refactor

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3394: Add tests, make BackendConfig own class, refactor
..


Patch Set 2: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4116/2/be/src/scheduling/backend-config-test.cc
File be/src/scheduling/backend-config-test.cc:

Line 32:   vector backends { (MakeNetworkAddress("localhost", 
1001)) };
can you check the google style guide whether the {} require spaces?


http://gerrit.cloudera.org:8080/#/c/4116/1/be/src/util/network-util.h
File be/src/util/network-util.h:

Line 55: TBackendDescriptor MakeBackendDescriptor(const Hostname& hostname, 
const IpAddr& ip,
> I tried to follow the surrounding code here. IMO changing this to an output
leave it in.

we'll need to come up with some coherent guidelines.


-- 
To view, visit http://gerrit.cloudera.org:8080/4116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2d3acb6f68b16ca0af06dad0098d7ec1eff41202
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Remove unused MemTracker debug logging

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: Remove unused MemTracker debug logging
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9871839938ff8f7a56ca0fa5f991cce9de6ada79
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3394: Add tests, make BackendConfig own class, refactor

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3394: Add tests, make BackendConfig own class, refactor
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4116/1/be/src/scheduling/backend-config-test.cc
File be/src/scheduling/backend-config-test.cc:

Line 31:   vector backends;
you can use a {} c'tor


Line 62:   const auto& backend_list = 
backend_config.GetBackendListForHost("10.0.0.1");
auto only for iterators


http://gerrit.cloudera.org:8080/#/c/4116/1/be/src/scheduling/backend-config.cc
File be/src/scheduling/backend-config.cc:

Line 56:   BackendList* be_descs = &backend_map_[be_desc.ip_address];
you can use a ref here


http://gerrit.cloudera.org:8080/#/c/4116/1/be/src/scheduling/backend-config.h
File be/src/scheduling/backend-config.h:

Line 74: typedef std::shared_ptr BackendConfigPtr;
best left to the user of this class


http://gerrit.cloudera.org:8080/#/c/4116/1/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 217:   for (const auto& backend: backends) {
auto only for iterators


http://gerrit.cloudera.org:8080/#/c/4116/1/be/src/util/network-util.cc
File be/src/util/network-util.cc:

Line 159: 
remove one blank line


http://gerrit.cloudera.org:8080/#/c/4116/1/be/src/util/network-util.h
File be/src/util/network-util.h:

Line 27
this only ever returned a single address?


Line 55: TBackendDescriptor MakeBackendDescriptor(const Hostname& hostname, 
const IpAddr& ip,
output param


-- 
To view, visit http://gerrit.cloudera.org:8080/4116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2d3acb6f68b16ca0af06dad0098d7ec1eff41202
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3944: Fix crash in SimpleScheduler

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3944: Fix crash in SimpleScheduler
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1c2c58c2f9e7de0936aba03dcce86cdc4b31a866
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3610: Account for memory used by filters in the 
coordinator
..


Patch Set 2:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4066/2//COMMIT_MSG
Commit Message:

Line 25: the output broadcast structure without copying.
please add this explanation/invariant (which is very helpful for understanding 
the code) to the declaration of the affected data structures in the .h file.


http://gerrit.cloudera.org:8080/#/c/4066/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 431:   filter_mem_tracker_.reset(new MemTracker(-1, -1, "Runtime Filter 
(Coordinator)",
move into c'tor, this is setup that can be done prior to execution.


Line 1965:   unordered_set target_fragment_instance_idxs;
move to where it's used


Line 1979: // Check if the filter has already been sent, which could happen 
in three cases: if
turn into bullet points, easier to read


Line 1995: --state->pending_count;
turn this block into a member fn for state (ApplyUpdate?), that'll make it 
easier to follow


Line 2006:   // Discard as one missing update means a correct filter 
cannot be produced.
discard,


Line 2007:   state->disabled = true;
move into discard


Line 2013: // TODO: Implement BloomFilter::Or(const 
ThriftBloomFilter&)
if we already have problems with memory spikes, shouldn't this be addressed? 
there is no need to create a superfluous in-memory filter instead of coding Or 
to go directly against the thrift structure (do we ever need the non-thrift 
version?)


Line 2026: for (const auto& target: state->targets) {
remove auto unless it's an stl iterator


Line 2035:   // move the payload from the request rather than copy it and 
take double the memory
provide that information for the data structure in the .h file


Line 2040:   swap(rpc_params->bloom_filter, *filter);
why not do this for FilterState in the block above, rather than just the rpc 
param. and while we're at it, why not also do it for the first oncoming 
non-broadcast filter?


Line 2051:   for (const auto& target_idx: target_fragment_instance_idxs) {
don't use auto here, this isn't an iterator


Line 2054: auto cb = [rpc_params, addr = fragment_inst->impalad_address(),
bind was more compact; please revert


http://gerrit.cloudera.org:8080/#/c/4066/2/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

Line 195:   void Done();
TearDown? done is very generic.


Line 399: std::unique_ptr bloom_filter;
is this why you're including the filter header?

move FilterState into .cc instead (and remove include), we need to curtail 
rampant includes.


-- 
To view, visit http://gerrit.cloudera.org:8080/4066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Fix dynamic linking for Impala

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: Fix dynamic linking for Impala
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2fb1efd600bfaf17de730186fc80a0dbd976a049
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Martin Grund 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3944: Fix crash in SimpleScheduler

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3944: Fix crash in SimpleScheduler
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4102/2/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 266: return;
should this be return or continue? seems like we're ignoring a lot more in the 
return case (so at least change the log message?)


-- 
To view, visit http://gerrit.cloudera.org:8080/4102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1c2c58c2f9e7de0936aba03dcce86cdc4b31a866
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash 
table build cost
..


Patch Set 3: Code-Review+1

this needs verification of the plan changes for tpc-h/-ds on a larger data set.

-- 
To view, visit http://gerrit.cloudera.org:8080/4098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4014: Introduce query-wide execution state.
..


Patch Set 10:

i'll do the next round once sailesh has addressed henry's comments.

-- 
To view, visit http://gerrit.cloudera.org:8080/3817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4014: Introduce query-wide execution state.
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/fragment-instance-exec-state.h
File be/src/runtime/fragment-instance-exec-state.h:

PS10, Line 33: FragmentInstanceExecState
> Let's consider FragmentInstanceState and QueryState (if they're not executi
i think that's a good idea, long names aren't necessarily more readable or 
informative.


-- 
To view, visit http://gerrit.cloudera.org:8080/3817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3944: Fix crash in SimpleScheduler

2016-08-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3944: Fix crash in SimpleScheduler
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4102/1/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 261: VLOG(1) << "Ignoring subscription request with empty IP 
address from subscriber: "
describe in comment why this would be empty.


Line 262:   << be_desc.address;
indent 4 spaces


-- 
To view, visit http://gerrit.cloudera.org:8080/4102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1c2c58c2f9e7de0936aba03dcce86cdc4b31a866
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost

2016-08-23 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash 
table build cost
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4098/2/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 461:   partitionCost = Math.round(lhsNetworkCost + rhsNetworkCost + 
rhsDataSize);
you're not guaranteed to have computed rhsDataSize at this point


-- 
To view, visit http://gerrit.cloudera.org:8080/4098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost

2016-08-23 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash 
table build cost
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4098/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 463:   partitionCost = Math.round(lhsCost + 2 * rhsCost);
this isn't correct, because rhsCost is the network cost (which can be 0 if the 
rhs is already partitioned in the desired way). you need to account for the 
hash table build cost independently.


-- 
To view, visit http://gerrit.cloudera.org:8080/4098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-22 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-22 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has submitted this change and it was merged.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


IMPALA-3988: Only use first 96 bits of query id

This adds utility functions in uid-util.h to create query and instance
ids and convert between them. It also adapts SimpleScheduler to
utilize those functions when creating the instance id
(TPlanFragmentInstanceCtx.fragment_instance_id).

Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Reviewed-on: http://gerrit.cloudera.org:8080/4065
Reviewed-by: Marcel Kornacker 
Tested-by: Marcel Kornacker 
---
M be/src/scheduling/simple-scheduler.cc
M be/src/service/impala-server.cc
M be/src/util/CMakeLists.txt
A be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M common/thrift/ImpalaInternalService.thrift
6 files changed, 96 insertions(+), 10 deletions(-)

Approvals:
  Marcel Kornacker: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-22 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-2550 Introduce query-wide execution context.

2016-08-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: PREVIEW: IMPALA-2550 Introduce query-wide execution context.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3817/5/be/src/runtime/query-exec-state.h
File be/src/runtime/query-exec-state.h:

Line 101:   /// referenced as a shared_ptr to allow asynchronous calls to 
CancelPlanFragment()
> Having shared_ptrs are how it was done before. I've changed it to use raw p
they are, and i have a patch out (3988) that embeds the instance index into the 
instance id in a way that makes it easy to extract the index, given just the 
instance id (you won't need the query id). the index will always be 
0..#instances-1.


-- 
To view, visit http://gerrit.cloudera.org:8080/3817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4065/8/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

PS8, Line 505: int instance_idx = num_fragment_instances + j;
> it would be clearer as:
sure. i'll change that in the next set of changes.


http://gerrit.cloudera.org:8080/#/c/4065/8/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS8, Line 857: uuid query_uuid = uuid_generator();
> don't need the temporary?
i'll update that in the  next set of changes.


http://gerrit.cloudera.org:8080/#/c/4065/8/be/src/util/uid-util-test.cc
File be/src/util/uid-util-test.cc:

PS8, Line 37: EXPECT_EQ(qid.hi, query_id.hi);
: EXPECT_EQ(qid.lo, query_id.lo);
> Why not:
ditto


http://gerrit.cloudera.org:8080/#/c/4065/8/be/src/util/uid-util.h
File be/src/util/uid-util.h:

PS8, Line 51: const
> constexpr
what's the advantage in this context?


Line 55:   memcpy(&result.hi, &uuid.data[0], 8);
> do we know that memcpy of 8 bytes generates the same asm as :
what is the concern?


-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-21 Thread Marcel Kornacker (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4065

to look at the new patch set (#8).

Change subject: IMPALA-3988: Only use first 96 bits of query id
..

IMPALA-3988: Only use first 96 bits of query id

This adds utility functions in uid-util.h to create query and instance
ids and convert between them. It also adapts SimpleScheduler to
utilize those functions when creating the instance id
(TPlanFragmentInstanceCtx.fragment_instance_id).

Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
---
M be/src/scheduling/simple-scheduler.cc
M be/src/service/impala-server.cc
M be/src/util/CMakeLists.txt
A be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M common/thrift/ImpalaInternalService.thrift
6 files changed, 96 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/4065/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4065/7/be/src/util/uid-util.h
File be/src/util/uid-util.h:

PS7, Line 61: (1L << 32) - 1
> IMHO, defining this as a constant like the previous patch seems to be bette
Done


PS7, Line 66: -1
> nit: missing space.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend

2016-08-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4054/3/fe/src/main/java/com/cloudera/impala/service/Frontend.java
File fe/src/main/java/com/cloudera/impala/service/Frontend.java:

Line 964:   planner.computeResourceReqs(fragments, true, queryExecRequest);
> This function assumes that the entire query is captured by the given fragme
anything llama-related is going to be removed for the next release. i'll leave 
this alone for now.


Line 1042: for (int idx = 0; idx < fragments.size(); ++idx) {
> factor our the common code between createExecRequest() and createPlanExecIn
all of the non-mt paths (meaning the functions for which this patch creates a 
"Mt-" equivalent) are going to be retired in the not-too-distant future. i 
didn't feel that factoring out common code between the two alternative paths 
has value, because a) this code has been fairly static in the past and doesn't 
see many changes, b) some of this may get ripped out later anyway.

in other words, you should consider the "Mt-" functions the end state, and look 
at them from the perspective of 'is that what the code should look like?' 
rather than 'if we have both the Mt- and the other path, what should the code 
look like?'.

i'll augment the commit msg.


-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-21 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4065/7//COMMIT_MSG
Commit Message:

Line 9: This adds utility function in uid-util.h to create query and instance
fixed the grammar error


-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-20 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new patch set (#7).

Change subject: IMPALA-3988: Only use first 96 bits of query id
..

IMPALA-3988: Only use first 96 bits of query id

This adds utility function in uid-util.h to create query and instance
ids and convert between them. It also adapts SimpleScheduler to
utilize those functions when creating the instance id
(TPlanFragmentInstanceCtx.fragment_instance_id).

Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
---
M be/src/scheduling/simple-scheduler.cc
M be/src/service/impala-server.cc
M be/src/util/CMakeLists.txt
A be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M common/thrift/ImpalaInternalService.thrift
6 files changed, 94 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/4065/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-20 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3988: Only use first 96 bits of query id
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

PS6, Line 505: int
> Will there be a chance of overflow here ? Should we use int64_t instead ?
there is no plausible scenario in which we'll ever have 1 billion fragment 
instances for a single query, let alone more than that.

i'll adapt the interface.


http://gerrit.cloudera.org:8080/#/c/4065/6/be/src/util/uid-util.h
File be/src/util/uid-util.h:

PS6, Line 75: int
> Does this have the same overflow problem here ?
this interface doesn't really make any sense, i'll rework it (and hardwire 
int32_t).


-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id

2016-08-20 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new patch set (#6).

Change subject: IMPALA-3988: Only use first 96 bits of query id
..

IMPALA-3988: Only use first 96 bits of query id

This adds utility function in uid-util.h to create query and instance
ids and convert between them. It also adapts SimpleScheduler to
utilize those functions when creating the instance id
(TPlanFragmentInstanceCtx.fragment_instance_id).

Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
---
M be/src/scheduling/simple-scheduler.cc
M be/src/scheduling/simple-scheduler.h
M be/src/service/impala-server.cc
M be/src/util/CMakeLists.txt
A be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M common/thrift/ImpalaInternalService.thrift
7 files changed, 107 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/4065/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2.

2016-08-20 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or 
AVX2.
..


Patch Set 45: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3081/45/be/src/util/bit-util.cc
File be/src/util/bit-util.cc:

Line 169: inline void SimdByteSwap::ByteSwapSimd(const void* source, const int 
len, void* dest) {
add DCHECK(TEMPLATE_DATA_WIDTH == 16 || TEMPLATE_DATA_WIDTH == 32); 
right below here

otherwise bswap_fptr never gets set


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 45
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3988: Only use first 104 bits of query id

2016-08-19 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new patch set (#5).

Change subject: IMPALA-3988: Only use first 104 bits of query id
..

IMPALA-3988: Only use first 104 bits of query id

This adds utility function in uid-util.h to create query and instance
ids and convert between them. It also adapts SimpleScheduler to
utilize those functions when creating the instance id
(TPlanFragmentInstanceCtx.fragment_instance_id).

Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
---
M be/src/scheduling/simple-scheduler.cc
M be/src/scheduling/simple-scheduler.h
M be/src/service/impala-server.cc
M be/src/util/CMakeLists.txt
A be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M common/thrift/ImpalaInternalService.thrift
7 files changed, 105 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/4065/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 


  1   2   3   4   5   >