[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator
Henry Robinson has posted comments on this change. Change subject: IMPALA-3610: Account for memory used by filters in the coordinator .. Patch Set 1: (1 comment) Added code to move, rather than copy, input filter to output, which prevents briefly having two copies of the same large filter. http://gerrit.cloudera.org:8080/#/c/4066/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 1994: ++state->num_received; : DCHECK_LE(state->num_received, state->src_fragment_instance_idxs.size()); > This doesn't seem very necessary anymore. It seems redundant seeing that we Good catch, done. -- To view, visit http://gerrit.cloudera.org:8080/4066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3610: Account for memory used by filters in the coordinator
Hello Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4066 to look at the new patch set (#2). Change subject: IMPALA-3610: Account for memory used by filters in the coordinator .. IMPALA-3610: Account for memory used by filters in the coordinator Before this patch, Impala would not account for the memory used to aggregate runtime filters together in the coordinator. Impala's memory could therefore be silently overcommitted. This patch accounts for aggregated filter memory in a new filter memtracker that is attached to the coordinator's query_mem_tracker(). If the query memory limit is exceeded when a filter update arrives, that update is discarded. If the filter is from a partitioned join, the entire filter can therefore be discarded immediately (to alleviate memory pressure) and a dummy 'always true' filter is sent to backends to unblock them. If the filter is from a broadcast join, no aggregation is done, so there is no tracking. The Thrift input and output filter data structures are not tracked (as we generally don't track RPC objects, but plan to in the future). The filter payload is moved from the input request structure to the output broadcast structure without copying. Memory that is added to a memtracker must always be released. To do this, we need to signal to the coordinator that it is finished, and that there is no point trying to process any future updates that might arrive concurrently. This patch adds Coordinator::Done() which is called from QueryExecState::Done(), and which releases memory from all in-process runtime filters. Finally, this patch increases the upper limit for runtime filters to 512MB. This allows testing on very large datasets. The default maximum is still 16MB, per RUNTIME_FILTER_MAX_SIZE. Testing: Added a new test that triggers the OOM condition on the coordinator. All existing runtime filter tests pass. Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/mem-tracker.h M be/src/runtime/runtime-filter-bank.h M be/src/service/impala-server.h M be/src/service/query-exec-state.cc M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test 7 files changed, 139 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/4066/2 -- To view, visit http://gerrit.cloudera.org:8080/4066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil
Re: Error linking impalad with -build_shared_libs option
Hi Martin, Nice to hear from you. https://issues.cloudera.org/browse/IMPALA-3979 Thanks, Dan On Tuesday, August 23, 2016, Martin Grund (Das Grundprinzip.de) < grundprin...@gmail.com> wrote: > I've seen this in the past and propose a patch to fix it: > https://gerrit.cloudera.org/#/c/4108/ > > Let me know if you have JIRA to track this to update the commit message. > > Martin > > On Mon, Aug 22, 2016 at 10:15 PM Laszlo Gaal > > wrote: > > > link.txt is generated by cmake based on the contents of CMakeLists.txt > and > > the project/environment settings. Most of the link-time dependencies seem > > to be set up in $(IMPALA_HOME)/be/CMakeLists.txt, based on settings > > calculated one level up in $(IMPALA_HOME)/CMakeLists.txt. > > $(IMPALA_HOME)/be/CMakeLists.txt looks complex; I'll look more into it > > tomorrow. > > > > Thanks, > > > > - LaszloG > > > > On Mon, Aug 22, 2016 at 6:07 PM, Yonghyun Hwang > > > wrote: > > > > > I observed the same linking error and digged this into further. The > below > > > is what I've found. > > > > > > *1. how to reproduce* > > > > > > # assuming that impala-config.sh is sourced > > > $ cd ${IMPALA_HOME} > > > *$ make -j8 impalad* > > > ... > > > [100%] Built target CodeGen > > > Linking CXX executable ../../build/debug/service/impalad > > > ... > > > ../../build/debug/util/libUtil.so: undefined reference to > > > `google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, > > bool > > > (*)(google_breakpad::MinidumpDescriptor const&, void*, bool), void*)' > > > ../../build/debug/util/libUtil.so: undefined reference to > > > `snappy::MaxCompressedLength(unsigned long)' > > > ../../build/debug/util/libUtil.so: undefined reference to `my_strlen' > > > ../../build/debug/util/libUtil.so: undefined reference to > `LZ4_compress' > > > ../../build/debug/util/libUtil.so: undefined reference to > > > `BZ2_bzDecompress' > > > ... > > > > > > > > > *$ cd ${IMPALA_HOME}/be/src/service* > > > *$ ${IMPALA_HOME}/toolchain/cmake-3.2.3-p1/bin/cmake -E * > > > *cmake_link_script CMakeFiles/impalad.dir/link.txt --verbose=* > > > ... > > > ../../build/debug/util/libUtil.so: undefined reference to > > > ... > > > *../../build/debug/util/libUtil.so: undefined reference to > > `LZ4_compress'* > > > *../../build/debug/util/libUtil.so: undefined reference to > > > `BZ2_bzDecompress'* > > > ../../build/debug/util/libUtil.so: undefined reference to > > > *`snappy::RawCompress*(char const*, > > > unsigned long, char*, unsigned long*)' > > > ... > > > boost::regex_traits > > > >::do_assign(char > > > const*, char > > > const*, unsigned int)' > > > ... > > > ../../build/debug/util/libUtil.so: undefined reference to *`snappy::* > > > RawUncompress(char > > > const*, unsigned long, char*)' > > > > > > *2. cause of the failure* > > > > > > $ cd ${IMPALA_HOME}/be/src/service > > > *$ cat CMakeFiles/impalad.dir/link.txt* > > > ${IMPALA_HOME}/toolchain/gcc-4.9.2/bin/g++-Wall -Wno-sign-compare > > > -Wno-unknown-pragmas -pthread -fno-strict-aliasing -std=c++14 > > > -Wno-deprecated ... > > > *../../../toolchain/snappy-1.1.3/lib/libsnappy.a > > > ../../../toolchain/lz4-svn/lib/liblz4.a* > > > ... > > > *../../../toolchain/boost-1.57.0/lib/libboost_regex.a* > > > ... > > > *../../build/debug/util/libUtil.so* > > > -lpthread -ldl -lssl -lcrypto > > > > > > Apparently, libUtil.so __should__ show before libsnappy.a and *regex.a. > > > > > > > > > Next step is to find out who creates CMakeFiles/impalad.dir/link.txt > and > > > why libUtil.so is placed after libsnappy.a, liblz4.a, and > > libboost_regex.a > > > > > > Thank you, > > > Yonghyun > > > > > > > > > > > > > > > > > > On Wed, Aug 17, 2016 at 10:13 AM, Henry Robinson > > > > wrote: > > > > > > > Only that it's IMPALA-3979. Does anyone have any theories yet? > > > > > > > > On 17 August 2016 at 10:10, Dimitris Tsirogiannis < > > > > dtsirogian...@cloudera.com > wrote: > > > > > > > > > Hey guys, > > > > > > > > > > When I build from the latest master using the -build_shared_libs > > > option I > > > > > get linking errors (see [1]). Building with static works fine. Any > > > ideas? > > > > > > > > > > Thanks > > > > > Dimitris > > > > > > > > > > > > > > > [1] > > > > > Linking CXX executable ../../build/debug/service/impalad > > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > > 'BZ2_bzBuffToBuffCompress' > > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > > 'snappy::MaxCompressedLength(unsigned long)' > > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > > 'snappy::RawCompress(char const*, unsigned long, char*, unsigned > > > long*)' > > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > > 'LZ4_compress' > > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > > 'BZ2_bzDecompressEnd' > > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > > 'BZ2_bzDecompressInit' >
[Impala-CR](cdh5-trunk) IMPALA-3610: Account for memory used by filters in the coordinator
Henry Robinson has abandoned this change. Change subject: IMPALA-3610: Account for memory used by filters in the coordinator .. Abandoned Moved to ASF project. -- To view, visit http://gerrit.cloudera.org:8080/3618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I3c52c8a1c2e79ef370c77bf264885fc859678d1b Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3221: Copyright / license audit
Henry Robinson has posted comments on this change. Change subject: IMPALA-3221: Copyright / license audit .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/3995/1/LICENSE.txt File LICENSE.txt: Line 461: shell/ext-py/prettytable-0.7.1: 3-clause BSD (see shell/ext-py/prettytable-0.7.1/PKG-INFO) > There is sopy copyright text in shell/ext-py/prettytable-0.7.1/COPYING. sql Thanks for finding those! I don't *think* we need a COPYING for sasl; the entry in LICENSE is enough (I think). -- To view, visit http://gerrit.cloudera.org:8080/3995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I24a868aec6a4f17f4ccca1b088d2f0de32f75d87 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3221: Copyright / license audit
Henry Robinson has uploaded a new patch set (#3). Change subject: IMPALA-3221: Copyright / license audit .. IMPALA-3221: Copyright / license audit Populates LICENSE.txt with known third-party licenses in the Impala codebase. Change-Id: I24a868aec6a4f17f4ccca1b088d2f0de32f75d87 --- M LICENSE.txt D be/src/gutil/LICENSE.txt M be/src/runtime/string-search.h 3 files changed, 354 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/3995/3 -- To view, visit http://gerrit.cloudera.org:8080/3995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I24a868aec6a4f17f4ccca1b088d2f0de32f75d87 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple
Re: Error linking impalad with -build_shared_libs option
I've seen this in the past and propose a patch to fix it: https://gerrit.cloudera.org/#/c/4108/ Let me know if you have JIRA to track this to update the commit message. Martin On Mon, Aug 22, 2016 at 10:15 PM Laszlo Gaal wrote: > link.txt is generated by cmake based on the contents of CMakeLists.txt and > the project/environment settings. Most of the link-time dependencies seem > to be set up in $(IMPALA_HOME)/be/CMakeLists.txt, based on settings > calculated one level up in $(IMPALA_HOME)/CMakeLists.txt. > $(IMPALA_HOME)/be/CMakeLists.txt looks complex; I'll look more into it > tomorrow. > > Thanks, > > - LaszloG > > On Mon, Aug 22, 2016 at 6:07 PM, Yonghyun Hwang > wrote: > > > I observed the same linking error and digged this into further. The below > > is what I've found. > > > > *1. how to reproduce* > > > > # assuming that impala-config.sh is sourced > > $ cd ${IMPALA_HOME} > > *$ make -j8 impalad* > > ... > > [100%] Built target CodeGen > > Linking CXX executable ../../build/debug/service/impalad > > ... > > ../../build/debug/util/libUtil.so: undefined reference to > > `google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, > bool > > (*)(google_breakpad::MinidumpDescriptor const&, void*, bool), void*)' > > ../../build/debug/util/libUtil.so: undefined reference to > > `snappy::MaxCompressedLength(unsigned long)' > > ../../build/debug/util/libUtil.so: undefined reference to `my_strlen' > > ../../build/debug/util/libUtil.so: undefined reference to `LZ4_compress' > > ../../build/debug/util/libUtil.so: undefined reference to > > `BZ2_bzDecompress' > > ... > > > > > > *$ cd ${IMPALA_HOME}/be/src/service* > > *$ ${IMPALA_HOME}/toolchain/cmake-3.2.3-p1/bin/cmake -E * > > *cmake_link_script CMakeFiles/impalad.dir/link.txt --verbose=* > > ... > > ../../build/debug/util/libUtil.so: undefined reference to > > ... > > *../../build/debug/util/libUtil.so: undefined reference to > `LZ4_compress'* > > *../../build/debug/util/libUtil.so: undefined reference to > > `BZ2_bzDecompress'* > > ../../build/debug/util/libUtil.so: undefined reference to > > *`snappy::RawCompress*(char const*, > > unsigned long, char*, unsigned long*)' > > ... > > boost::regex_traits > > >::do_assign(char > > const*, char > > const*, unsigned int)' > > ... > > ../../build/debug/util/libUtil.so: undefined reference to *`snappy::* > > RawUncompress(char > > const*, unsigned long, char*)' > > > > *2. cause of the failure* > > > > $ cd ${IMPALA_HOME}/be/src/service > > *$ cat CMakeFiles/impalad.dir/link.txt* > > ${IMPALA_HOME}/toolchain/gcc-4.9.2/bin/g++-Wall -Wno-sign-compare > > -Wno-unknown-pragmas -pthread -fno-strict-aliasing -std=c++14 > > -Wno-deprecated ... > > *../../../toolchain/snappy-1.1.3/lib/libsnappy.a > > ../../../toolchain/lz4-svn/lib/liblz4.a* > > ... > > *../../../toolchain/boost-1.57.0/lib/libboost_regex.a* > > ... > > *../../build/debug/util/libUtil.so* > > -lpthread -ldl -lssl -lcrypto > > > > Apparently, libUtil.so __should__ show before libsnappy.a and *regex.a. > > > > > > Next step is to find out who creates CMakeFiles/impalad.dir/link.txt and > > why libUtil.so is placed after libsnappy.a, liblz4.a, and > libboost_regex.a > > > > Thank you, > > Yonghyun > > > > > > > > > > > > On Wed, Aug 17, 2016 at 10:13 AM, Henry Robinson > > wrote: > > > > > Only that it's IMPALA-3979. Does anyone have any theories yet? > > > > > > On 17 August 2016 at 10:10, Dimitris Tsirogiannis < > > > dtsirogian...@cloudera.com> wrote: > > > > > > > Hey guys, > > > > > > > > When I build from the latest master using the -build_shared_libs > > option I > > > > get linking errors (see [1]). Building with static works fine. Any > > ideas? > > > > > > > > Thanks > > > > Dimitris > > > > > > > > > > > > [1] > > > > Linking CXX executable ../../build/debug/service/impalad > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'BZ2_bzBuffToBuffCompress' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'snappy::MaxCompressedLength(unsigned long)' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'snappy::RawCompress(char const*, unsigned long, char*, unsigned > > long*)' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'LZ4_compress' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'BZ2_bzDecompressEnd' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'BZ2_bzDecompressInit' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'BZ2_bzBuffToBuffDecompress' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'BZ2_bzDecompress' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'snappy::GetUncompressedLength(char const*, unsigned long, unsigned > > > > long*)' > > > > ../../build/debug/util/libUtil.so: error: undefined reference to > > > > 'snappy::RawUncompres
[Impala-ASF-CR] Fix dynamic linking for Impala
Martin Grund has uploaded a new change for review. http://gerrit.cloudera.org:8080/4108 Change subject: Fix dynamic linking for Impala .. Fix dynamic linking for Impala When building Impala, the user can choose between two different build modes: static and dynamically linked libraries. The first one is used for release builds, while the latter one is used mainly locally for developers to shorten build times. However, for some time now, the shared build has been broken in some configurations. This patch fixes this issue. The main solution is to move the webserver into the Util library and to remove the manual linking of Util into the error-utils-test. Change-Id: I2fb1efd600bfaf17de730186fc80a0dbd976a049 --- M be/CMakeLists.txt M be/src/util/CMakeLists.txt 2 files changed, 8 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/4108/1 -- To view, visit http://gerrit.cloudera.org:8080/4108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2fb1efd600bfaf17de730186fc80a0dbd976a049 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Martin Grund
[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4014: Introduce query-wide execution state. .. Patch Set 10: (51 comments) http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/fragment-instance-exec-state.h File be/src/runtime/fragment-instance-exec-state.h: Line 32: /// Execution state of a single plan fragment. instance PS10, Line 33: FragmentInstanceExecState Let's consider FragmentInstanceState and QueryState (if they're not executing, what are they doing having state?) PS10, Line 39: lambda( : boost::mem_fn(&FragmentInstanceExecState::ReportStatusCb), : this, _1, _2, _3) Try and find a clearer way to write this. If you don't want to write the lambda in the constructor list, consider a static function that returns a lambda given 'this' as an input: std::function MakeCb(FragmentInstanceExecState* fragment_instance) { return [fragment_instance](const Status& s, RuntimeProfile* profile, bool done) { fragment_instance->ReportStatusCb(s, profile, done); } } Please leave a TODO to aggregate all the reports into a single CB managed by the QES (and file a JIRA as well). PS10, Line 80: ImpalaBackendClientCache* client_cache_; Is this necessary given that ExecEnv can give us the same pointer? PS10, Line 77: QueryExecState* query_exec_state_; // not owned : TPlanFragmentInstanceCtx fragment_instance_ctx_; : FragmentInstanceExecutor executor_; : ImpalaBackendClientCache* client_cache_; : TExecPlanFragmentParams exec_params_; Good opportunity to add some comments for these. PS10, Line 83: plan fragment fragment instance. http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/fragment-instance-executor.cc File be/src/runtime/fragment-instance-executor.cc: PS10, Line 80: Close Since we are aiming for deterministic, manual management of finst lifetimes, how about moving this out of the destructor and ensuring that the FInst calls Close() explicitly? Line 349: // We also want to call sink_->Close() here rather than in FragmentInstanceExecutor::Close long line http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/fragment-instance-executor.h File be/src/runtime/fragment-instance-executor.h: Line 48: /// FragmentInstanceExecutor handles all aspects of the execution of a single plan fragment, long line PS10, Line 155: ExecEnv* exec_env_; // not owned Can you remove this? ExecEnv::GetInstance() works fine. http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/query-exec-state.cc File be/src/runtime/query-exec-state.cc: PS10, Line 60: remove space PS10, Line 73: DCHECK_EQ(fragment_inst_exec_state_map_.find(GetInstanceIdx(fragment_instance_id)), : fragment_inst_exec_state_map_) these aren't the same type (does this compile?). PS10, Line 76: FragmentInstanceExecState* fragment_inst_exec_state = : new FragmentInstanceExecState(exec_params, this, ExecEnv::GetInstance()); Presumably this should be managed by obj_pool. If not, the value type of the map should be unique_ptr<...>. PS10, Line 80: his RPC returns This isn't an RPC any more. Update comment. PS10, Line 113: lexical_cast PrintId() PS10, Line 128: lexical_cast PrintId() http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/runtime/query-exec-state.h File be/src/runtime/query-exec-state.h: Line 38: /// This class is responsible for creating, running and destroying FragmentInstanceExecStates. Talk about lifetime here: when is this created, when is it destroyed, what are the guarantees about the subordinate data structures like finstexecstates. Can this outlive a query on the coordinator? etc. PS10, Line 41: // Special constructor for the coordinator. TODO: remove. PS10, Line 59: CleanFragmentInstanceStates Don't talk about implementation details (cleaning the map). When would this be called? Line 61: /// Registers a new FragmentInstanceExecState and launches the thread that calls Prepare() and long line. PS10, Line 90: std::unique_ptr obj_pool_ Comment broadly what this is used to manage, since that affects lifecycles. PS10, Line 90: * do you mean unique_ptr ? PS10, Line 90: Owned redundant wrt unique_ptr PS10, Line 94: active expand. What does it mean for a fragment to be active? PS10, Line 105: shared_ptr Update comment. PS10, Line 109: FragmentInstanceExecState Are the exec states managed by the obj pool? If so, say so otherwise the lifecycle is not clear. PS10, Line 113: shared pointer update comment http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/service/client-request-exec-state.cc File be/src/service/client-request-exec-state.cc: Line 17: #include "service/client-request-exec-state.h" add a blank line before this one http://gerrit.cloudera.org:8080/#/c/3817/10/be/src/service/client-request-ex
[Impala-ASF-CR] IMPALA-(3895,3859): Don't log file data on parse errors
Henry Robinson has posted comments on this change. Change subject: IMPALA-(3895,3859): Don't log file data on parse errors .. Patch Set 5: Exhaustive tests failed because the offsets for the GZIP version of alltypeserrorornulls is different than for most other file formats. I need to change that test to use row_regex instead. -- To view, visit http://gerrit.cloudera.org:8080/4020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. IMPALA-1731,IMPALA-3868: Float values are not parsed correctly Fixed StringToFloatInternal() not to parse strings like "1.23inf" and "infinite" with leading/trailing garbage as Infinity. These strings are now rejected with PARSE_FAILURE. Only "inf" and "infinity" are accepted, parsing is case-insensitive. "NaN" values are handled similarly: strings with leading/trailing garbage like "nana" are rejected, parsing is case-insensitive. Other changes: - StringToFloatInternal() was cleaned up a bit. Parsing inf and NaN strings was moved out of the main loop. - Use std::numeric_limits::infinity() instead of INFINITY macro and std::numeric_limits::quiet_NaN() instead of NAN macro. - Fixed another minor bug: multiple dots are allowed when parsing float values (e.g. "2.1..6" is interpreted as 2.16). - New BE and E2E tests were added. Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Reviewed-on: http://gerrit.cloudera.org:8080/3791 Reviewed-by: Michael Ho Tested-by: Internal Jenkins --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/exprs.test 4 files changed, 205 insertions(+), 41 deletions(-) Approvals: Michael Ho: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. IMPALA-3832: test invalid data handling in lzo text scanner This adds the lzo text scanner to the fuzz testing. I ran the test in a loop under ASAN overnight and didn't see any failures. Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Reviewed-on: http://gerrit.cloudera.org:8080/4096 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 17 insertions(+), 8 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4001: qgen: add proof of concept tests for Query() objects
David Knupp has posted comments on this change. Change subject: IMPALA-4001: qgen: add proof of concept tests for Query() objects .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/4081/1/tests/comparison/tests/fake_query.py File tests/comparison/tests/fake_query.py: PS1, Line 97: must at least a "must at least *contain* a" ...? "must at least *be* a" ...? PS1, Line 97: fram s/fram/from/ http://gerrit.cloudera.org:8080/#/c/4081/1/tests/comparison/tests/query_object_testdata.py File tests/comparison/tests/query_object_testdata.py: PS1, Line 64: DATASET So -- these are really test cases, yes? Could this just be called QUERY_TEST_CASES, or something. DATASET seems confusing, at least to me. http://gerrit.cloudera.org:8080/#/c/4081/1/tests/comparison/tests/test_query_objects.py File tests/comparison/tests/test_query_objects.py: PS1, Line 25: query_something I'm a little confused by this. query_something is a ... QueryTest object (namedtuple), right? From the DATASET list? Maybe a more descriptive variable name could be employed here, to help clarify some of the pytest magic going on. Line 53: @pytest.mark.parametrize('query_test', DATASET, ids=_idfn) How do you feel about moving all of the tests to the end of the file, rather than interspersing them with utility functions? -- To view, visit http://gerrit.cloudera.org:8080/4081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2ed1960430ae0af469986e33f88aecb6fa74e999 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2428: Support multiple-character string as the field delimiter
Yuanhao Luo has abandoned this change. Change subject: IMPALA-2428: Support multiple-character string as the field delimiter .. Abandoned I will re-push this commit when Impala comes to 3.0 -- To view, visit http://gerrit.cloudera.org:8080/3314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Id1437ca35dc4fdc58a7db1c2c70d4da30adf0c3e Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Yuanhao Luo Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Yuanhao Luo
[Impala-ASF-CR] IMPALA-3945: Forbid create text table with nonsensical delimiter combinations.
Yuanhao Luo has abandoned this change. Change subject: IMPALA-3945: Forbid create text table with nonsensical delimiter combinations. .. Abandoned I will re-push this commit when Impala comes to 3.0 -- To view, visit http://gerrit.cloudera.org:8080/3839 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I836b45f930e680ea03274405b90b1ee14822152b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yuanhao Luo Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yuanhao Luo
[Impala-ASF-CR] IMPALA-3988: Only use first 96 bits of query id
Henry Robinson has posted comments on this change. Change subject: IMPALA-3988: Only use first 96 bits of query id .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/4065/8/be/src/util/uid-util.h File be/src/util/uid-util.h: PS8, Line 51: const > what's the advantage in this context? Tells the reader and the compiler that this is really and truly a constant (i.e. not dependent on any variable), and is available for use in other constexpr contexts; for example template parameters. At this point it's mostly a readability aid. Line 55: memcpy(&result.hi, &uuid.data[0], 8); > what is the concern? Seemed more readable as an assignment to me. But the compiler special-cases the 8-byte copy AFAICT, so not worth changing. -- To view, visit http://gerrit.cloudera.org:8080/4065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia56a03ed9a1d7e77c72b66a01cd48c5b6bf3624f Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash table build cost .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4098/2/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java: Line 461: lhsTree.getCardinality() * ExchangeNode.getAvgSerializedRowSize(lhsTree)); > you're not guaranteed to have computed rhsDataSize at this point Sorry about that. It seems like this might be a bug with the existing code, though? Since it means that we don't do the hash table size check at line 489 in situations where we have a rhs cardinality estimate but no lhs num nodes, since we don't calculate the broadcast cost even though we could calculate the size of the rhs in this situation. I submitted a new version with both of these things fixed, but of course I can do it a different way if it was intentional that we don't do the hash table size check for the broadcast in situations where we can't calculate the broadcast cost. -- To view, visit http://gerrit.cloudera.org:8080/4098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost
Thomas Tauber-Marshall has uploaded a new patch set (#3). Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash table build cost .. IMPALA-2932: Extend DistributedPlanner to account for hash table build cost When deciding between a broadcast or repartition join, Impala calculates the cost of each join as the total amount of data that is sent over the network. This ignores some relevant costs, and can lead to bad plans. One such relevant cost is the work to create the hash table used in the join. This patch accounts for this by adding the amount of data inserted into the hash table (the size of the right side of the join) to the previous cost. This generally increases the estimated cost of broadcast joins relative to repartitioning joins, as the broadcast join must build the hash table on each node the data was broadcast to, so its effect will be to make repartitioning joins more likely to be chosen, especially in large clusters. This patch has not yet been performance tested. Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 --- M fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test 8 files changed, 153 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4098/3 -- To view, visit http://gerrit.cloudera.org:8080/4098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash table build cost .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4098/2/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java: Line 461: partitionCost = Math.round(lhsNetworkCost + rhsNetworkCost + rhsDataSize); you're not guaranteed to have computed rhsDataSize at this point -- To view, visit http://gerrit.cloudera.org:8080/4098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3944: Fix crash in SimpleScheduler
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/4102 Change subject: IMPALA-3944: Fix crash in SimpleScheduler .. IMPALA-3944: Fix crash in SimpleScheduler SimpleScheduler::BackendConfig maintains two maps, hostname->IP and IP->[list of backends]. In situations where multiple backends run on a single host, BackendConfig::RemoveBackend was erroneously removing the hostname->IP mapping, even when the list of backends for a host was not empty. The fix is to only remove this mapping when the last backend of a host gets removed. I will push a test in a separate change. Change-Id: I1c2c58c2f9e7de0936aba03dcce86cdc4b31a866 --- M be/src/scheduling/simple-scheduler.cc 1 file changed, 11 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/4102/1 -- To view, visit http://gerrit.cloudera.org:8080/4102 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1c2c58c2f9e7de0936aba03dcce86cdc4b31a866 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. Patch Set 5: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/124/ -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.
Sailesh Mukil has uploaded a new patch set (#10). Change subject: IMPALA-4014: Introduce query-wide execution state. .. IMPALA-4014: Introduce query-wide execution state. This patch is a header preview of a query wide execution state for a backend. Changes in design are as follows: FragmentMgr -> QueryExecMgr The QueryExecMgr now receives incoming fragments, creates a new QueryExecState if it is the first fragment to arrive for that query, and passes this received fragment along to the QueryExecState. This class is responsible for cleaning up the QueryExecState. QueryExecState (new): This is the query wide state for the backend. It is initialized by the first fragment to arrive to the QueryExecMgr. This class is now responsible for creating FragmentExecStates and executing them. Its life is protected by a ref counting mechanism and it is scheduled for destruction once the ref count reaches zero. Once scheduled for destruction, a thread in the QueryExecMgr will destroy the QueryExecState. Every user of the QueryExecState must access it within the scope of ScopedQueryExecStateRef which guarantees ref count incrementing and decrementing at entry/exit of the scope. QueryExecState (old) -> ClientRequestExecState: This is just a class name change. We still do not include shared state into the QueryExecState because there needs to be changes to DescriptorTbl, etc. before we can incorporate them into the QueryExecState. Will be done as separate patches. P.S: This only shows what the changes would look like and as of yet still does not compile. Excludes: - Some class renames in places not important to the core patch logic. - Most .cc files to make use of changes made. Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c --- M be/src/runtime/CMakeLists.txt M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h R be/src/runtime/fragment-instance-exec-state.cc R be/src/runtime/fragment-instance-exec-state.h R be/src/runtime/fragment-instance-executor.cc R be/src/runtime/fragment-instance-executor.h A be/src/runtime/query-exec-state.cc A be/src/runtime/query-exec-state.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/service/CMakeLists.txt R be/src/service/client-request-exec-state.cc R be/src/service/client-request-exec-state.h D be/src/service/fragment-mgr.cc D be/src/service/fragment-mgr.h M be/src/service/impala-internal-service.h A be/src/service/query-exec-mgr.cc A be/src/service/query-exec-mgr.h M common/thrift/ImpalaInternalService.thrift 20 files changed, 735 insertions(+), 360 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/3817/10 -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4014: Introduce query-wide execution state.
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-4014: Introduce query-wide execution state. .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/3817/7/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: Line 85: > The decrementing of ref cnt occurs in ReleaseQueryExecState, which will rem Ah yes, my bad. I've made this a DCHECK -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4096/3/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: PS3, Line 143: corrupt LZO index file > should we file a JIRA for this? technically we should skip the file when a Done -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. Patch Set 4: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4096 to look at the new patch set (#4). Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. IMPALA-3832: test invalid data handling in lzo text scanner This adds the lzo text scanner to the fuzz testing. I ran the test in a loop under ASAN overnight and didn't see any failures. Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 17 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/4096/4 -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2550 Introduce query-wide execution context.
Dan Hecht has posted comments on this change. Change subject: IMPALA-2550 Introduce query-wide execution context. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/3817/7/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: Line 85: if (i->second->num_current_references_ == 0) return NULL; > No that's not how it works. The decrementing of ref cnt occurs in ReleaseQueryExecState, which will remove from the map when the cnt hits 0. So we never have a QES in the map with cnt == 0. -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Dan Hecht has posted comments on this change. Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4096/3/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: PS3, Line 143: corrupt LZO index file should we file a JIRA for this? technically we should skip the file when abort_on_error=0 in this case, right? -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4009: qgen: add docs, utility for installing Oracle as ref database
Michael Brown has posted comments on this change. Change subject: IMPALA-4009: qgen: add docs, utility for installing Oracle as ref database .. Patch Set 2: Thanks for the reviews. In patch set 2, I added a small utility to verify installation and connectivity. Thomas, let me address your comments in the next patch set. -- To view, visit http://gerrit.cloudera.org:8080/4095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4009: qgen: add docs, utility for installing Oracle as ref database
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4095 to look at the new patch set (#2). Change subject: IMPALA-4009: qgen: add docs, utility for installing Oracle as ref database .. IMPALA-4009: qgen: add docs, utility for installing Oracle as ref database This patch adds documentation and a small utility for users wishing to run the random query generator against Oracle as a reference database. In particular, the patch * points users toward installing Oracle Database. * provides directions on incorporating a Python Oracle client (cx_Oracle) into our impala-python virtual environment. * provides a small utility that will verify installation and connectivity between client and server This documentation isn't meant to be a complete guide. Other patches will add documentation for actually loading random query generator test data into Oracle, for example. Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 --- A tests/comparison/ORACLE.txt M tests/comparison/README A tests/comparison/util/verify-oracle-connection.py 3 files changed, 176 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/4095/2 -- To view, visit http://gerrit.cloudera.org:8080/4095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4009: qgen: add documentation for installing Oracle as ref database
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4009: qgen: add documentation for installing Oracle as ref database .. Patch Set 1: (1 comment) I performed the steps (with the docker instructions), and it all works, modulo the few things specific to my environment that instructions like these can't really anticipate. http://gerrit.cloudera.org:8080/#/c/4095/1/tests/comparison/ORACLE.txt File tests/comparison/ORACLE.txt: Line 68: 1. Refer to the instant-client installation instructions. As of this I'm confused by the structure here, since these instructions are referring to the same step as the "1. Get Oracle instant-client" above. I get that its "pre-reqs" vs. "installation instructions", but I think it would be clearer if there was just one ordered list of instructions, with installing the pre-reqs as the first couple of bullet points. -- To view, visit http://gerrit.cloudera.org:8080/4095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2550 Introduce query-wide execution context.
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-2550 Introduce query-wide execution context. .. Patch Set 9: (3 comments) In this patch the following is addressed: - DescriptorTbl is not shared anymore. I've moved it back to RuntimeState as we still need to change how the serialization/deserialization is done before we can make it shared. It will be done in a separate patch. - QueryExecMgr now starts fragment threads so that refcount management stays in the scope of QueryExecMgr and is not passed to a lower level. - FragmentInstanceExecState refcount mechanism removed. Now these objects have the same lifetime as QueryExecState and are destroyed when QueryExecState is destroyed. - fragment_inst_exec_state_map_ is now indexed by fragment instance index and not fragment instance IDs. (Rebased on top of IMPALA-3988) http://gerrit.cloudera.org:8080/#/c/3817/7/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: Line 85: > how is this possible? it looks like an invariant that anything in the map w No that's not how it works. Everytime a new fragment comes in, FindOrCreateQueryExecState() is called (in RegisterFragmentInstanceWithQuery()) which increases the refcount by 1. Everytime a fragment completes (in QEM->ExecInstance()), ReleaseQES() is called which decrements the refcount by 1. This means that this refcount will ultimately reach 0. Alternatively CancelPlanFragment() and PublishFilter() use QESGuard() which calls a GetQES() at the beginning of its scope and a ReleaseQES() when it goes out of scope which will effectively balance the refcount. http://gerrit.cloudera.org:8080/#/c/3817/8/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: Line 54: QueryExecState* query_exec_state = FindOrCreateQueryExecState(exec_params.query_ctx); > comment seems redundant with the code (especially if you rename the method Done PS8, Line 100: > FindOrCreateQueryExecState() given that it creates (as opposed to inserting Done -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2550 Introduce query-wide execution context.
Sailesh Mukil has uploaded a new patch set (#9). Change subject: IMPALA-2550 Introduce query-wide execution context. .. IMPALA-2550 Introduce query-wide execution context. This patch is a header preview of a query wide execution state for a backend. Changes in design are as follows: FragmentMgr -> QueryExecMgr The QueryExecMgr now receives incoming fragments, creates a new QueryExecState if it is the first fragment to arrive for that query, and passes this received fragment along to the QueryExecState. This class is responsible for cleaning up the QueryExecState. QueryExecState (new): This is the query wide state for the backend. It is initialized by the first fragment to arrive to the QueryExecMgr. This class is now responsible for creating FragmentExecStates and executing them. Its life is protected by a ref counting mechanism and it is scheduled for destruction once the ref count reaches zero. Once scheduled for destruction, a thread in the QueryExecMgr will destroy the QueryExecState. Every user of the QueryExecState must access it within the scope of ScopedQueryExecStateRef which guarantees ref count incrementing and decrementing at entry/exit of the scope. QueryExecState (old) -> ClientRequestExecState: This is just a class name change. We still do not include shared state into the QueryExecState because there needs to be changes to DescriptorTbl, etc. before we can incorporate them into the QueryExecState. Will be done as separate patches. P.S: This only shows what the changes would look like and as of yet still does not compile. Excludes: - Some class renames in places not important to the core patch logic. - Most .cc files to make use of changes made. Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c --- M be/src/runtime/CMakeLists.txt M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h R be/src/runtime/fragment-instance-exec-state.cc R be/src/runtime/fragment-instance-exec-state.h R be/src/runtime/fragment-instance-executor.cc R be/src/runtime/fragment-instance-executor.h A be/src/runtime/query-exec-state.cc A be/src/runtime/query-exec-state.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/service/CMakeLists.txt R be/src/service/client-request-exec-state.cc R be/src/service/client-request-exec-state.h D be/src/service/fragment-mgr.cc D be/src/service/fragment-mgr.h M be/src/service/impala-internal-service.h A be/src/service/query-exec-mgr.cc A be/src/service/query-exec-mgr.h M common/thrift/ImpalaInternalService.thrift 20 files changed, 735 insertions(+), 360 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/3817/9 -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner We currently don't do any codegen in the hdfs-parquet scanner, which limits performance. This patch creates a new function, ProcessScratchBatch, which contains the inner loop of TransferScratchTuples, allowing us to cross-compile only this performance critical part. It also uses CodegenEvalConjuncts to replace the call to EvalConjuncts in ProcessScratchBatch with a codegen'd version. Additionally, it modifies the Codegen functions in hdfs-avro/text/sequence-scanner to take an output parameter for the codegen'd function and return a Status in order to improve logging around codegen failures. Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Reviewed-on: http://gerrit.cloudera.org:8080/4093 Reviewed-by: Thomas Tauber-Marshall Reviewed-by: Matthew Jacobs Tested-by: Internal Jenkins --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h A be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 15 files changed, 276 insertions(+), 128 deletions(-) Approvals: Matthew Jacobs: Looks good to me, approved Internal Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/4093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash table build cost .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4098/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java: Line 463: LOG.debug("partition: cost=" + Long.toString(partitionCost)); > this isn't correct, because rhsCost is the network cost (which can be 0 if Done -- To view, visit http://gerrit.cloudera.org:8080/4098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash table build cost .. IMPALA-2932: Extend DistributedPlanner to account for hash table build cost When deciding between a broadcast or repartition join, Impala calculates the cost of each join as the total amount of data that is sent over the network. This ignores some relevant costs, and can lead to bad plans. One such relevant cost is the work to create the hash table used in the join. This patch accounts for this by adding the amount of data inserted into the hash table (the size of the right side of the join) to the previous cost. This generally increases the estimated cost of broadcast joins relative to repartitioning joins, as the broadcast join must build the hash table on each node the data was broadcast to, so its effect will be to make repartitioning joins more likely to be chosen, especially in large clusters. This patch has not yet been performance tested. Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 --- M fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test 8 files changed, 150 insertions(+), 129 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4098/2 -- To view, visit http://gerrit.cloudera.org:8080/4098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-3945: Forbid create text table with nonsensical delimiter combinations.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3945: Forbid create text table with nonsensical delimiter combinations. .. Patch Set 3: Can we abandon this for now if we don't have any intent to merge it? -- To view, visit http://gerrit.cloudera.org:8080/3839 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I836b45f930e680ea03274405b90b1ee14822152b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yuanhao Luo Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yuanhao Luo Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Michael Ho has posted comments on this change. Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4078/1/bin/clean.sh File bin/clean.sh: PS1, Line 49: $IMPALA_ALL_LOGS_DIRS > Yes and no. On one hand, this is supposed to contain multiple log, so havin May be I misunderstood your comment but for all practical purposes, we don't usually use multiple different log directories. -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2. .. Patch Set 47: Should I start the merge for this? -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 47 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Youwei Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Michael Ho has posted comments on this change. Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. Patch Set 3: (5 comments) It appears that you are already fixing some of the places which are considered "non-dangerous". Can you please just bite the bullet and fix up the rest pointed out below ? http://gerrit.cloudera.org:8080/#/c/4078/3/bin/impala-config.sh File bin/impala-config.sh: PS3, Line 45: $IMPALA_HOME/toolchain missing quote ? http://gerrit.cloudera.org:8080/#/c/4078/3/bin/run-all-tests.sh File bin/run-all-tests.sh: PS3, Line 111: ${IMPALA_HOME}/testdata/bin/split-hbase.sh missing quote ? PS3, Line 137: ${IMPALA_HOME}/bin/run-workload.py missing quote ? PS3, Line 173: ${IMPALA_HOME}/bin/start-impala-cluster.py missing quote ? PS3, Line 177: ${IMPALA_HOME}/bin/mvn-quiet.sh missing quote ? -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2700: ASCII NUL characters are doubled on insert into text tables
Tim Armstrong has abandoned this change. Change subject: IMPALA-2700: ASCII NUL characters are doubled on insert into text tables .. Abandoned This was already merged to master: https://gerrit.cloudera.org/#/c/3876/ -- To view, visit http://gerrit.cloudera.org:8080/3703 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ia30fa314d1ee1e99f9e7598466eb1570ca7940fc Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: anujphadke Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4096/2/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: Line 61: pytest.skip() > why? We don't have decimal_tbl for these file formats. Added comment. Line 144: table_format.compression_codec != 'none'): > what errors do we produce? Good point. I see a lot of: Error while reading index file: hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_c35099ab.db/alltypes/year=2010/month=2/22_0.lzo.index Updated comment -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. IMPALA-3832: test invalid data handling in lzo text scanner This adds the lzo text scanner to the fuzz testing. I ran the test in a loop under ASAN overnight and didn't see any failures. Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 16 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/4096/3 -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-3938: Disallow implicit references between nested collections.
Alex Behm has posted comments on this change. Change subject: IMPALA-3938: Disallow implicit references between nested collections. .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/4079/3/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java: Line 578: // Tests that an attempted implicit match must be succeeded by an explicit match. Please adjust this comment. It is no longer accurate after your changes. Same for the comments in this test cluster below. This is basically the same case as the one you fixed below (implicit references cannot go through collections) Line 589: // - however, we are not allowed to implicitly skip 'item' again, since we have Please adjust this comment also. Line 591: // - the rule is: an implicit match must be followed by an explicit one Please adjust this comment also. What is the rule? Line 594: AnalysisError("select 1 from d.t8.s1.s2.a, a.f", you could you add a few tests below that are similar to this? i.e, the invalid reference is in the FROM clause Line 597: // Test to validate that implicit references between collection types cannot be used. Tests that implicit references are not allowed through collection types. -- To view, visit http://gerrit.cloudera.org:8080/4079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Christopher Channing Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Dan Hecht has posted comments on this change. Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4096/2/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: Line 61: pytest.skip() why? Line 144: table_format.compression_codec != 'none'): what errors do we produce? -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2550 Introduce query-wide execution context.
Dan Hecht has posted comments on this change. Change subject: IMPALA-2550 Introduce query-wide execution context. .. Patch Set 8: (3 comments) Just to summarize some key points from the discussion today so we don't forget: - We agreed that ref counting is the way to go and explicit ref counting avoids "accidental" leakage of shared_ptr copies. - We agreed to restructure the fragment instance thread code so that the reference held by that thread would be done from the top-level thread method, rather than the reference being passed implicitly between classes like in the current patchset. - We agreed to try to have a hierarchical structure to the code to make ownership and lifetime as clear as possible. - We agreed that we only need reference counting at the QES level. Everything "below" would have the same lifetime as QES and does not need separate ref counting. http://gerrit.cloudera.org:8080/#/c/3817/8/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: Line 54: // Create or retrieve the QueryExecState for this query id. comment seems redundant with the code (especially if you rename the method as suggested below). Line 85: if (i->second->num_current_references_ == 0) return NULL; how is this possible? it looks like an invariant that anything in the map will have refcnt > 0. PS8, Line 100: Insert FindOrCreateQueryExecState() given that it creates (as opposed to inserting one that was created by the caller). -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-(3895,3859): Don't log file data on parse errors
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4020 to look at the new patch set (#5). Change subject: IMPALA-(3895,3859): Don't log file data on parse errors .. IMPALA-(3895,3859): Don't log file data on parse errors Logging file or table data is a bad idea, and doing it by default is particularly bad. This patch changes HdfsScanNode::LogRowParseError() to log a file and offset only. Testing: See rewritten tests. To support testing this change, we also fix IMPALA-3895, by introducing a canonical string __HDFS_FILENAME__ that all Hadoop filenames in the ERROR output are replaced with before comparing with the expected results. This fixes a number of issues with the old way of matching filenames which purported to be a regex, but really wasn't. In particular, we can now match the rest of an ERROR line after the filename, which was not possible before. In some cases, we don't want to substitute filenames because the ERROR output is looking for a very specific output. In that case we can write: $NAMENODE/ and this patch will not perform _any_ filename substitutions on ERROR sections that contain the $NAMENODE string. Finally, this patch fixes a bug where a test that had an ERRORS section but no RESULTS section would silently pass without testing anything. Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hbase-scan-node-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-rcfile-scan-node-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-sequence-scan-errors.test M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test M testdata/workloads/functional-query/queries/QueryTest/strict-mode-abort.test M testdata/workloads/functional-query/queries/QueryTest/strict-mode.test M tests/common/impala_test_suite.py M tests/common/test_result_verifier.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py 19 files changed, 393 insertions(+), 408 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4020/5 -- To view, visit http://gerrit.cloudera.org:8080/4020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-(3895,3859): Don't log file data on parse errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-(3895,3859): Don't log file data on parse errors .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-(3895,3859): Don't log file data on parse errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-(3895,3859): Don't log file data on parse errors .. Patch Set 4: > This passes a core test run. Running exhaustive, and local (to test > FS changes) now. Did those tests pass? If not, do you want me to take over fixing the tests since you are on PTO this week? -- To view, visit http://gerrit.cloudera.org:8080/4020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash table build cost .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4098/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java: Line 463: partitionCost = Math.round(lhsCost + 2 * rhsCost); this isn't correct, because rhsCost is the network cost (which can be 0 if the rhs is already partitioned in the desired way). you need to account for the hash table build cost independently. -- To view, visit http://gerrit.cloudera.org:8080/4098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4009: qgen: add documentation for installing Oracle as ref database
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4009: qgen: add documentation for installing Oracle as ref database .. Patch Set 1: Code-Review+1 This looks great, thanks for writing up the readme. I didn't walk through the steps but it makes sense. -- To view, visit http://gerrit.cloudera.org:8080/4095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3938: Disallow implicit references between nested collections.
Christopher Channing has posted comments on this change. Change subject: IMPALA-3938: Disallow implicit references between nested collections. .. Patch Set 2: (4 comments) Done http://gerrit.cloudera.org:8080/#/c/4079/2/fe/src/main/java/com/cloudera/impala/analysis/Path.java File fe/src/main/java/com/cloudera/impala/analysis/Path.java: Line 227: && isOptionalFieldCollectionType((CollectionStructType)structType); > Based on this latest version I think the code can be simplified to not even Done http://gerrit.cloudera.org:8080/#/c/4079/2/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java: Line 597: // Test implicit key/value reference is not abused when using a map within an array. > can you explain which resolution is not allowed? Done Line 598: addTestTable("create table d.t9 (c1 array>)"); > nit: use 'c' instead of 'c1' for consistency with the other tests Based on the comment below I've left this as c1 and added a new column c2 (adhered to the tests above). Line 599: AnalysisError("select key from d.t9.c1", > add similar tests for a map nested inside a map (we can run into the same i As discussed, I've added a map> -- To view, visit http://gerrit.cloudera.org:8080/4079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Christopher Channing Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3938: Disallow implicit references between nested collections.
Christopher Channing has uploaded a new patch set (#3). Change subject: IMPALA-3938: Disallow implicit references between nested collections. .. IMPALA-3938: Disallow implicit references between nested collections. The Bug: Prior to this patch Impala would accept and execute a query of the form "select pos, key, value from tbl.arr_map_col order by pos asc", where "arr_map_col" is an array of map. In this example, the map is not referenced as a collection and therefore the implicit rule of using "item" should not be applied. The results produced from this erroneous query are arbitrary. The semantically correct way of expressing the query would be of the form "select a.pos, m.key, m.value from tbl t, t.arr_map_col a, a.item m order by a.pos asc". The Fix: Removed the existing 'expectExplicitMatch' flag to simplify the code. During query analysis (more specifically: path resolution), if part of a path cannot be explicitly matched then the planner will fall back to an implicit field check. As part of this check, the type of the nested field inside the collection will be inspected to verify that it does not match that of a collection type. If so, the path resolution will fail and an AnalysisException will be thrown indicating that the path cannot be resolved. For example, trying to execute the query "select pos, key, value from tbl.arr_map_col order by pos asc" would yield the following error "Could not resolve column/field reference: 'key'". Regarding test coverage, the existing test suite "AnalyzeStmtsTest" has been augmented with a set of new tests that will verify that implicit references between collections cannot be used. Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c --- M fe/src/main/java/com/cloudera/impala/analysis/Path.java M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java 2 files changed, 31 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4079/3 -- To view, visit http://gerrit.cloudera.org:8080/4079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Christopher Channing
[Impala-ASF-CR] Infomation schema (preview for frontend)
Kathy Sun has posted comments on this change. Change subject: Infomation_schema (preview for frontend) .. Patch Set 8: (13 comments) Please skip patch 7, it's a wrong reformat. Patch 8 is a new reformat using .clang_format below is the reply I was working on, those will take effect in next patch. http://gerrit.cloudera.org:8080/#/c/3863/6/be/src/exec/info-schema-scan-node.cc File be/src/exec/info-schema-scan-node.cc: PS6, Line 100: alue = m > Constants should be named like: COL_NAMES Done PS6, Line 136: xt** ctxs > I think we can get rid of the 'tuple_pool' variable and just use row_batch- Done Line 153: > This loop looks sufficient to me to materialise any number of rows. unfortunately... I think it's not. I'm still testing this. discuss offline. Line 168: > Our coding standard says to use '++idx'. Not a big deal but I think it's ea Done Line 192 > You can actually remove DebugString(). We don't have it for all scan nodes. nice, that's a relief. :P http://gerrit.cloudera.org:8080/#/c/3863/6/fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java File fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java: Line 532: case VIEW_METADATA: > Can you add a TODO like: Done Line 533: case SELECT: > There are some tabs instead of spaces here. Let's just figure out how to se Done http://gerrit.cloudera.org:8080/#/c/3863/6/fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaTable.java File fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaTable.java: PS6, Line 34: InformationSchemaTable > I spent some time thinking about this and I'm thinking we should call this to discuss in 1o1 Line 45: > Your text editor is leaving in trailing whitespace - it should be possible Done PS6, Line 52: Name > We should keep the column names lower-case (they're converted to lower-case Done Line 89: /** > Is this ever called? I think since isLoaded() is always true, this should n Done Line 109: TResultSetMetadata resultSchema = new TResultSetMetadata(); > Comment not needed. Done http://gerrit.cloudera.org:8080/#/c/3863/6/fe/src/main/java/com/cloudera/impala/planner/InfoSchemaScanNode.java File fe/src/main/java/com/cloudera/impala/planner/InfoSchemaScanNode.java: PS6, Line 42: InfoSchemaScanNode > Maybe this should be SystemTableScanNode (assuming we do the rename I sugge discuss offline -- To view, visit http://gerrit.cloudera.org:8080/3863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kathy Sun Gerrit-Reviewer: Kathy Sun Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Infomation schema (preview for frontend)
Kathy Sun has uploaded a new patch set (#8). Change subject: Infomation_schema (preview for frontend) .. Infomation_schema (preview for frontend) Since this is just a preview and not the final patch, request reviewers to please focus their comments more on the design aspect. This is to expose metadata (current status of impala, e.g impala-metrics) into virtual table, so that users could query them via sql. You can run impala-shell.sh and type query like: select * from information_schema.scheduler; select name, value, description from information_schema.scheduler; Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc A be/src/exec/info-schema-scan-node.cc A be/src/exec/info-schema-scan-node.h M be/src/runtime/coordinator.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/scheduling/simple-scheduler.cc M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java M fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java M fe/src/main/java/com/cloudera/impala/analysis/DescriptorTable.java M fe/src/main/java/com/cloudera/impala/analysis/TupleDescriptor.java M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java A fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaDb.java A fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaTable.java A fe/src/main/java/com/cloudera/impala/planner/InfoSchemaScanNode.java M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java M fe/src/main/java/com/cloudera/impala/service/Frontend.java M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/info-schema.test 24 files changed, 709 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/3863/8 -- To view, visit http://gerrit.cloudera.org:8080/3863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kathy Sun Gerrit-Reviewer: Kathy Sun Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2932: Extend DistributedPlanner to account for hash table build cost
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/4098 Change subject: IMPALA-2932: Extend DistributedPlanner to account for hash table build cost .. IMPALA-2932: Extend DistributedPlanner to account for hash table build cost When deciding between a broadcast or repartition join, Impala calculates the cost of each join as the total amount of data that is sent over the network. This ignores some relevant costs, and can lead to bad plans. One such relevant cost is the work to create the hash table used in the join. This patch accounts for this by adding the amount of data inserted into the hash table (the size of the right side of the join) to the previous cost. This generally increases the estimated cost of broadcast joins relative to repartitioning joins, as the broadcast join must build the hash table on each node the data was broadcast to, so its effect will be to make repartitioning joins more likely to be chosen, especially in large clusters. This patch has not yet been performance tested. Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 --- M fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test 8 files changed, 148 insertions(+), 125 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4098/1 -- To view, visit http://gerrit.cloudera.org:8080/4098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I03a0f56f69c8deae68d48dfdb9dc95b71aec11f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] Add .clang-format for Impala's C++ style
Tim Armstrong has posted comments on this change. Change subject: Add .clang-format for Impala's C++ style .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/3886/6/.clang-format File .clang-format: Line 1: BasedOnStyle: Google Google's style has DerivePointerAlignment = true, which means that if pointers are formatted like int *p in a file instead of int* p, it doesn't update them. I think we should override this to false in this patch or a follow-on. -- To view, visit http://gerrit.cloudera.org:8080/3886 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I274c5993c7be344fc4b7729d21a13da993f9f3aa Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Michael Ho has posted comments on this change. Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. Patch Set 5: Code-Review+2 Carry +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Attila Jeges has uploaded a new patch set (#5). Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. IMPALA-1731,IMPALA-3868: Float values are not parsed correctly Fixed StringToFloatInternal() not to parse strings like "1.23inf" and "infinite" with leading/trailing garbage as Infinity. These strings are now rejected with PARSE_FAILURE. Only "inf" and "infinity" are accepted, parsing is case-insensitive. "NaN" values are handled similarly: strings with leading/trailing garbage like "nana" are rejected, parsing is case-insensitive. Other changes: - StringToFloatInternal() was cleaned up a bit. Parsing inf and NaN strings was moved out of the main loop. - Use std::numeric_limits::infinity() instead of INFINITY macro and std::numeric_limits::quiet_NaN() instead of NAN macro. - Fixed another minor bug: multiple dots are allowed when parsing float values (e.g. "2.1..6" is interpreted as 2.16). - New BE and E2E tests were added. Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e --- M be/src/exprs/expr-test.cc M be/src/util/string-parser-test.cc M be/src/util/string-parser.h M testdata/workloads/functional-query/queries/QueryTest/exprs.test 4 files changed, 205 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/3791/5 -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-2550 Introduce query-wide execution context.
Dan Hecht has posted comments on this change. Change subject: IMPALA-2550 Introduce query-wide execution context. .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/3817/7/be/src/service/query-exec-mgr.cc File be/src/service/query-exec-mgr.cc: Line 85: if (i->second->num_current_references_ == 0) return NULL; how is this possible? it looks like an invariant that anything in the map will have refcnt > 0. http://gerrit.cloudera.org:8080/#/c/3817/7/be/src/service/query-exec-mgr.h File be/src/service/query-exec-mgr.h: PS7, Line 127: /// The thread pool that is in charge of destroying a QueryExecState when it is no : /// longer in use. : ThreadPool destroy_thread_; > Currently I've made it that way because if a fragment is the last user of t The QES doesn't delete itself. The QEM does it, so there isn't a problem of doing it directly. -- To view, visit http://gerrit.cloudera.org:8080/3817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I892091d6401acb2ea91ccb1623af54c6f9635e6c Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3832: test invalid data handling in lzo text scanner
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4096 Change subject: IMPALA-3832: test invalid data handling in lzo text scanner .. IMPALA-3832: test invalid data handling in lzo text scanner This adds the lzo text scanner to the fuzz testing. I ran the test in a loop under ASAN overnight and didn't see any failures. Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 13 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/4096/1 -- To view, visit http://gerrit.cloudera.org:8080/4096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib707014c1fcfb80cb8076f644fc2b62a5ae758d7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose
Sailesh Mukil has submitted this change and it was merged. Change subject: IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose .. IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose The webserver address was always configured as 0.0.0.0 (meaning that the webserver could be reached on any IP for that machine) unless otherwise specified. This is not a correct value to dispay to the user. This patch returns the hostname of the node, when requested, if the webserver host address is 0.0.0.0. This patch also does not print the coordinator link for very simple queries, as it's not necessary and is unnecessarily verbose. This patch also does away with pinging the impalad an extra time per query for finding the host time and webserver address. It instead remembers the webserver address at connect time and displays client local time for every query instead. Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b Reviewed-on: http://gerrit.cloudera.org:8080/3994 Tested-by: Internal Jenkins Reviewed-by: Sailesh Mukil Tested-by: Sailesh Mukil --- M be/src/service/impala-beeswax-server.cc M be/src/util/webserver.cc M common/thrift/ImpalaService.thrift M shell/impala_client.py M shell/impala_shell.py M tests/shell/test_shell_commandline.py 6 files changed, 69 insertions(+), 30 deletions(-) Approvals: Internal Jenkins: Verified Sailesh Mukil: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/3994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose .. Patch Set 6: Verified+1 > Patch Set 6: Code-Review+2 Carrying +2 again. This ran through GVM and got verified+1, but for some reason the CR+2 status got dropped. Since this passed GVM, submitting manually. -- To view, visit http://gerrit.cloudera.org:8080/3994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] Infomation schema (preview for frontend)
Kathy Sun has posted comments on this change. Change subject: Infomation_schema (preview for frontend) .. Patch Set 7: (1 comment) Patch 7 is only for reformatting. you can skip reviewing this http://gerrit.cloudera.org:8080/#/c/3863/5/fe/src/main/java/com/cloudera/impala/analysis/TupleDescriptor.java File fe/src/main/java/com/cloudera/impala/analysis/TupleDescriptor.java: PS5, Line 206: Table table = getTable(); : TableId id = table.getId(); TODO: delete this -- To view, visit http://gerrit.cloudera.org:8080/3863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kathy Sun Gerrit-Reviewer: Kathy Sun Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4009: qgen: add documentation for installing Oracle as ref database
Michael Brown has uploaded a new change for review. http://gerrit.cloudera.org:8080/4095 Change subject: IMPALA-4009: qgen: add documentation for installing Oracle as ref database .. IMPALA-4009: qgen: add documentation for installing Oracle as ref database This patch adds documentation for users wishing to run the random query generator against Oracle as a reference database. In particular, the patch points users toward installing Oracle Database, and it provides directions on incorporating a Python Oracle client into our impala-python virtual environment. This documentation isn't meant to be a complete guide. Other patches will add documentation for actually loading random query generator test data into Oracle, for example. Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 --- A tests/comparison/ORACLE.txt M tests/comparison/README 2 files changed, 110 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/4095/1 -- To view, visit http://gerrit.cloudera.org:8080/4095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown
[Impala-ASF-CR] Infomation schema (preview for frontend)
Kathy Sun has uploaded a new patch set (#7). Change subject: Infomation_schema (preview for frontend) .. Infomation_schema (preview for frontend) Since this is just a preview and not the final patch, request reviewers to please focus their comments more on the design aspect. This is to expose metadata (current status of impala, e.g impala-metrics) into virtual table, so that users could query them via sql. You can run impala-shell.sh and type query like: select * from information_schema.scheduler; select name, value, description from information_schema.scheduler; Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc A be/src/exec/info-schema-scan-node.cc A be/src/exec/info-schema-scan-node.h M be/src/runtime/coordinator.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/scheduling/simple-scheduler.cc M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java M fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java M fe/src/main/java/com/cloudera/impala/analysis/DescriptorTable.java M fe/src/main/java/com/cloudera/impala/analysis/TupleDescriptor.java M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java A fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaDb.java A fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaTable.java A fe/src/main/java/com/cloudera/impala/planner/InfoSchemaScanNode.java M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java M fe/src/main/java/com/cloudera/impala/service/Frontend.java M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/info-schema.test 24 files changed, 739 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/3863/7 -- To view, visit http://gerrit.cloudera.org:8080/3863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kathy Sun Gerrit-Reviewer: Kathy Sun Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3973: add position and occurrence to instr()
Zoltan Ivanfi has uploaded a new change for review. http://gerrit.cloudera.org:8080/4094 Change subject: IMPALA-3973: add position and occurrence to instr() .. IMPALA-3973: add position and occurrence to instr() Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 82 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4094/1 -- To view, visit http://gerrit.cloudera.org:8080/4094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie9648de458d243306fa14adc5e7f7002bf6f67fd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. Patch Set 1: Code-Review+2 Carrying the +2 -- To view, visit http://gerrit.cloudera.org:8080/4093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3938: Prevent illegal implicit collection references being used.
Alex Behm has posted comments on this change. Change subject: IMPALA-3938: Prevent illegal implicit collection references being used. .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/4079/2/fe/src/main/java/com/cloudera/impala/analysis/Path.java File fe/src/main/java/com/cloudera/impala/analysis/Path.java: Line 227: && isOptionalFieldCollectionType((CollectionStructType)structType); Based on this latest version I think the code can be simplified to not even use expectExplicitMatch at all, as follows: // Map all remaining raw-path elements to field types and positions. while (rawPathIdx < rawPath_.size()) { if (!currentType.isComplexType()) return false; StructType structType = getTypeAsStruct(currentType); // Resolve explicit path. StructField field = structType.getField(rawPath_.get(rawPathIdx)); if (field == null) { // Resolve implicit path. if (structType instanceof CollectionStructType) { field = ((CollectionStructType) structType).getOptionalField(); // Collections must be matched explicitly. if (field.getType().isCollectionType()) return false; } else { // Failed to resolve implicit or explicit path. return false; } // Update the physical types/positions. matchedTypes_.add(field.getType()); matchedPositions_.add(field.getPosition()); currentType = field.getType(); // Do not consume a raw-path element. continue; } matchedTypes_.add(field.getType()); matchedPositions_.add(field.getPosition()); if (field.getType().isCollectionType() && firstCollectionPathIdx_ == -1) { Preconditions.checkState(firstCollectionTypeIdx_ == -1); firstCollectionPathIdx_ = rawPathIdx; firstCollectionTypeIdx_ = matchedTypes_.size() - 1; } currentType = field.getType(); ++rawPathIdx; } http://gerrit.cloudera.org:8080/#/c/4079/2/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java: Line 597: // Test implicit key/value reference is not abused when using a map within an array. can you explain which resolution is not allowed? "is not abused" is not very clear In fact, could you rephrase the comment in L578 and followin as well? The underlying issue seems to be that collections must always be matched explicitly. Do you agree? Line 598: addTestTable("create table d.t9 (c1 array>)"); nit: use 'c' instead of 'c1' for consistency with the other tests Line 599: AnalysisError("select key from d.t9.c1", add similar tests for a map nested inside a map (we can run into the same issue) -- To view, visit http://gerrit.cloudera.org:8080/4079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Christopher Channing Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1659: Netezza compatibility functions: metadata
Greg Rahn has posted comments on this change. Change subject: IMPALA-1659: Netezza compatibility functions: metadata .. Patch Set 2: I'm in favor of implementing CURRENT_SESSION then adding CURRENT_SID as an alias. -- To view, visit http://gerrit.cloudera.org:8080/4063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b5d1009bbf42acc175a942d2df484e1c64822ca Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. Patch Set 3: Code-Review+1 Will wait for Michael's +1 too. -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1659: Netezza compatibility functions: metadata
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1659: Netezza compatibility functions: metadata .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/4063/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 2702: TestStringValue("current_user()", "impala_test_user"); We should extend tests/custom_cluster/test_delegation.py to test that these return the right results when delegation is enabled. http://gerrit.cloudera.org:8080/#/c/4063/2/be/src/exprs/utility-functions-ir.cc File be/src/exprs/utility-functions-ir.cc: Line 121: return AnyValUtil::FromString( This should fit in a single 90-character line. http://gerrit.cloudera.org:8080/#/c/4063/2/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: Line 550: [['current_sid'], 'STRING', [], 'impala::UtilityFunctions::CurrentSid'], I spoke with Greg offline and he pointed out that there wasn't a clear use-case for current_sid, since there's nothing you can really do with the session id once you have it, e.g. cancel the session. Greg, do you see any risks in adding this now? My feeling is that it's ok to add it, but that we should call it current_session/CurrentSession and make current_sid an alias for it only. Line 551: [['user', 'current_user'], 'STRING', [], 'impala::UtilityFunctions::User'], I think you have current_user and session_user the wrong way around. session_user would be the user that initiated the session, so session_user doesn't change with impersonation, but current_user does change. E.g. see http://www.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.adv.doc/c_advsec_masquerading.html -- To view, visit http://gerrit.cloudera.org:8080/4063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b5d1009bbf42acc175a942d2df484e1c64822ca Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. Patch Set 1: Code-Review+1 Replaces: https://gerrit.cloudera.org/#/c/3774/ -- To view, visit http://gerrit.cloudera.org:8080/4093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Thomas Tauber-Marshall has abandoned this change. Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. Abandoned Replaced with: https://gerrit.cloudera.org/#/c/4093/ -- To view, visit http://gerrit.cloudera.org:8080/3774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 11 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/4093 Change subject: IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner .. IMPALA-3629: Codegen TransferScratchTuples() in hdfs-parquet-scanner We currently don't do any codegen in the hdfs-parquet scanner, which limits performance. This patch creates a new function, ProcessScratchBatch, which contains the inner loop of TransferScratchTuples, allowing us to cross-compile only this performance critical part. It also uses CodegenEvalConjuncts to replace the call to EvalConjuncts in ProcessScratchBatch with a codegen'd version. Additionally, it modifies the Codegen functions in hdfs-avro/text/sequence-scanner to take an output parameter for the codegen'd function and return a Status in order to improve logging around codegen failures. Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h A be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 15 files changed, 276 insertions(+), 128 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/4093/1 -- To view, visit http://gerrit.cloudera.org:8080/4093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-1659: Netezza compatibility functions: metadata
Greg Rahn has posted comments on this change. Change subject: IMPALA-1659: Netezza compatibility functions: metadata .. Patch Set 2: > > My recommendation would be to skip non-obvious > > implementations (like CURRENT_SCHEMA) for now until we can > discuss > > further. > > I removed CURRENT_SCHEMA() as you suggested. Should I remove any > other function? I think we're good as the others are straightforward. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/4063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b5d1009bbf42acc175a942d2df484e1c64822ca Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] Infomation schema (preview for frontend)
Tim Armstrong has posted comments on this change. Change subject: Infomation_schema (preview for frontend) .. Patch Set 6: (18 comments) Did a pass over it and made some general comments. I think we should do some cleanup and renaming of things before getting more eyes on it, so that the review will go as smoothly as possible for everyone. http://gerrit.cloudera.org:8080/#/c/3863/6/be/src/exec/info-schema-scan-node.cc File be/src/exec/info-schema-scan-node.cc: PS6, Line 100: ColNames Constants should be named like: COL_NAMES PS6, Line 136: tuple_pool I think we can get rid of the 'tuple_pool' variable and just use row_batch->tuple_data_pool() directly. (this pattern appears in other scanners for historical reasons) Line 153: while (!ReachedLimit() && !row_batch->AtCapacity() This loop looks sufficient to me to materialise any number of rows. Line 168: idx++; Our coding standard says to use '++idx'. Not a big deal but I think it's easiest to fix now. Line 192: // TODO need to discuss what should be print here You can actually remove DebugString(). We don't have it for all scan nodes. E.g. HdfsScanNode doesn't have an implementation. http://gerrit.cloudera.org:8080/#/c/3863/6/fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java File fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java: Line 532: case VIEW_METADATA: Can you add a TODO like: // TODO before committing: implement authorization for system tables So that other reviewers know we intend to fix this. Line 533: case SELECT: There are some tabs instead of spaces here. Let's just figure out how to set up your text editor to use spaces instead of tabs, it will save time overall. http://gerrit.cloudera.org:8080/#/c/3863/6/fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaTable.java File fe/src/main/java/com/cloudera/impala/catalog/InformationSchemaTable.java: PS6, Line 34: InformationSchemaTable I spent some time thinking about this and I'm thinking we should call this SystemTable/SystemDatabase. Since information_schema is a special kind of system database that shows catalog metadata, and probably shouldn't include other non-catalog metadata. Line 45: Your text editor is leaving in trailing whitespace - it should be possible to configure it to avoid this (it will save time in the long run). PS6, Line 52: Name We should keep the column names lower-case (they're converted to lower-case anyway). Line 89: // do nothing since isLoaded already Is this ever called? I think since isLoaded() is always true, this should never be called. If that's correct, we should just have this be: Preconditions.checkState(false); To assert that load() is never called for this table type. Line 109: // TODO Auto-generated method stub Comment not needed. http://gerrit.cloudera.org:8080/#/c/3863/6/fe/src/main/java/com/cloudera/impala/planner/InfoSchemaScanNode.java File fe/src/main/java/com/cloudera/impala/planner/InfoSchemaScanNode.java: Let's clean up the formatting in this file before getting others to review it - it's hard to read in gerrit. PS6, Line 42: InfoSchemaScanNode Maybe this should be SystemTableScanNode (assuming we do the rename I suggested). PS6, Line 45: KuduScanNode Needs updating Line 71: // TODO: Does the port matter? Add a TODO to have one scan range per backend (so other reviewers can see what we are planning). http://gerrit.cloudera.org:8080/#/c/3863/6/fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java File fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java: PS6, Line 1256: tblRef.getTable() Let's do some cleanup here by changing this to 'table'. I was confused why it was different. http://gerrit.cloudera.org:8080/#/c/3863/6/testdata/workloads/functional-planner/queries/PlannerTest/info-schema.test File testdata/workloads/functional-planner/queries/PlannerTest/info-schema.test: When I open this file locally there are a bunch of nonsense characters. I think that's why it thinks this is a binary file. 00:SCAN INFORMATION_SCHEMA [information_schema.impala_metrics]^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ -- To view, visit http://gerrit.cloudera.org:8080/3863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kathy Sun Gerrit-Reviewer: Kathy Sun Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3938: Prevent illegal implicit collection references being used.
Christopher Channing has uploaded a new patch set (#2). Change subject: IMPALA-3938: Prevent illegal implicit collection references being used. .. IMPALA-3938: Prevent illegal implicit collection references being used. The Bug: Prior to this patch Impala would accept and execute a query of the form "select pos, key, value from tbl.arr_map_col order by pos asc", where "arr_map_col" is an array of map. In this example, the map is not referenced as a collection and therefore the implicit rule of using "item" should not be applied. The results produced from this erroneous query are arbitrary. The semantically correct way of expressing the query would be of the form "select a.pos, m.key, m.value from tbl t, t.arr_map_col a, a.item m order by a.pos asc". The Fix: During query analysis (more specifically: path resolution), if part of a path cannot be explicitly matched then the planner will fall back to an implicit field check. As part of this check, the type of the nested field inside the collection will be inspected to verify that it does not match that of a MapType. If so, the path resolution will fail and an AnalysisException will be thrown indicating that the path cannot be resolved. For example, trying to execute the query "select pos, key, value from tbl.arr_map_col order by pos asc" would yield the following error "Could not resolve column/field reference: 'key'". Regarding test coverage, the existing test suite "AnalyzeStmtsTest" has been augmented with a set of new tests that will verify that the illegal implicit collection reference from Array->Map cannot be used. Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c --- M fe/src/main/java/com/cloudera/impala/analysis/Path.java M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java 2 files changed, 33 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4079/2 -- To view, visit http://gerrit.cloudera.org:8080/4079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Christopher Channing
[Impala-ASF-CR] IMPALA-3938: Prevent illegal implicit collection references being used.
Christopher Channing has posted comments on this change. Change subject: IMPALA-3938: Prevent illegal implicit collection references being used. .. Patch Set 1: (3 comments) I made the changes that you requested. http://gerrit.cloudera.org:8080/#/c/4079/1/fe/src/main/java/com/cloudera/impala/analysis/Path.java File fe/src/main/java/com/cloudera/impala/analysis/Path.java: Line 232: if (!structType.isMapType() && field.getType() instanceof MapType) { > how could structType every be a map type? do you mean to call isMapStruct() Sorry, not sure how I missed that. I tried out your suggestion, but it did not catch the following broken query: "select key from test_table.locations" (same behaviour that the bug exhibits). For this broken scenario, the flag @ L219 could be set if the type is a collection and we look at the optional type, but it would mean duplicating a little code to recreate the structure (in order to get the optional field). Instead of that approach, I've reworked this a little to populate expectExplicitMatch inside the loop when we're dealing with a hierarchy of nested collections. http://gerrit.cloudera.org:8080/#/c/4079/1/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java: Line 438: // Array of structs and maps with name conflicts. Both implicit and explicit > your new tests don't really belong in this test 'cluster' because there are Done Line 467: "Could not resolve column/field reference: 'm.value'"); > whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/4079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I703c7e9c8c9c95c9fee714767367373eec10cd0c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Christopher Channing Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Christopher Channing Gerrit-HasComments: Yes
[Impala-ASF-CR] Infomation schema (preview for frontend)
Tim Armstrong has posted comments on this change. Change subject: Infomation_schema (preview for frontend) .. Patch Set 6: Could you run clang-format on the backend to clean up the formatting? If you get .clang_format from https://gerrit.cloudera.org/#/c/3886/ and add these directories to your path: export PATH=$IMPALA_TOOLCHAIN/llvm-3.8.0-p1/bin:$IMPALA_TOOLCHAIN/llvm-3.8.0-p1/share/clang/:$PATH Then you can run this command to reformat your last commit: git diff -U0 HEAD^ | clang-format-diff.py -i -p1 -- To view, visit http://gerrit.cloudera.org:8080/3863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Kathy Sun Gerrit-Reviewer: Kathy Sun Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog .. IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog Currently Catalog aborts at startup if the class loader loading Java UDFs throws a LinkageError (e.g., NoClassDefError). The code has been designed to catch a generic "Exception" and ignore any incompatible or erroneous UDFs but it can't handle any "Error"s. Java generally doesn't encourage to handle all types of "Error"s, however given the Catalog tries to load thirdparty udfs, it is possible that there might be missing dependencies that the Catalog is expected to handle. Testing: Added an e-e test that loads a corrupt Java UDF from Hive and restarts Catalog to make sure its up and the function count tallies. Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 Reviewed-on: http://gerrit.cloudera.org:8080/4092 Reviewed-by: Bharath Vissapragada Tested-by: Internal Jenkins --- M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/com/cloudera/impala/hive/executor/UdfExecutor.java M tests/custom_cluster/test_permanent_udfs.py M tests/test-hive-udfs/pom.xml A tests/test-hive-udfs/src/main/java/com/cloudera/impala/UnresolvedUdf.java 5 files changed, 77 insertions(+), 2 deletions(-) Approvals: Bharath Vissapragada: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4006: dangerous rm -rf statements in scripts
Zoltan Ivanfi has uploaded a new patch set (#3). Change subject: IMPALA-4006: dangerous rm -rf statements in scripts .. IMPALA-4006: dangerous rm -rf statements in scripts Quoted variable substitutions in rm -rf commands and in many other places. This prevents disasters if those variables contain whitespace. Redirected output of the cd commands to /dev/null. This prevents polluting the target variable with the directory name when the CDPATH environment variable is set. Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe --- M bin/clean.sh M bin/impala-config.sh M bin/impala-python M bin/impala-python-common.sh M bin/run-all-tests.sh M bin/set-classpath.sh M buildall.sh M infra/python/deps/download_requirements 8 files changed, 88 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/4078/3 -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-4006: impala-config.sh contains dangerous rm -rf statements
Zoltan Ivanfi has posted comments on this change. Change subject: IMPALA-4006: impala-config.sh contains dangerous rm -rf statements .. Patch Set 1: (1 comment) > I believe you can avoid the spaces-in-lists-of-files problems using > bash arrays but I don't know if it's worth the effort The problem with bash arrays is that they can not be exported, their scope is limited to the current shell process. http://gerrit.cloudera.org:8080/#/c/4078/1/bin/run-all-tests.sh File bin/run-all-tests.sh: PS1, Line 101: LOG_DIR="${IMPALA_EE_TEST_LOGS_DIR}" > It's used by run-step Done -- To view, visit http://gerrit.cloudera.org:8080/4078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3893, IMPALA-3901: impala-shell prints incorrect coordinator address, overly verbose .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d167b66f2dd8629e40a7094d21ea7ce6b43d23b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3986: Python hive-client may fail silently while dropping partitions
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3986: Python hive-client may fail silently while dropping partitions .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4241225d19c75a79179a61fef0c75c06a9a2ac39 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3986: Python hive-client may fail silently while dropping partitions
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3986: Python hive-client may fail silently while dropping partitions .. IMPALA-3986: Python hive-client may fail silently while dropping partitions It was noticed during a test run that a partition existed in the HMS when it shouldn't have. The only possible explanation for that is that the python based hive client failed to drop the partition when asked to and we did not check for an error when it did so. This patch just adds an assert to make sure that the partition is dropped when instructed to, else it will cause the test to fail in the right place. Change-Id: I4241225d19c75a79179a61fef0c75c06a9a2ac39 Reviewed-on: http://gerrit.cloudera.org:8080/4083 Reviewed-by: Matthew Jacobs Tested-by: Internal Jenkins --- M tests/common/impala_test_suite.py 1 file changed, 2 insertions(+), 1 deletion(-) Approvals: Matthew Jacobs: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4241225d19c75a79179a61fef0c75c06a9a2ac39 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog .. Patch Set 1: Carried +2 from Alex. -- To view, visit http://gerrit.cloudera.org:8080/4092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog
Bharath Vissapragada has abandoned this change. Change subject: IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog .. Abandoned Thanks Alex. Moving this to asf. http://gerrit.cloudera.org:8080/4092 -- To view, visit http://gerrit.cloudera.org:8080/3567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-1731,IMPALA-3868: Float values are not parsed correctly
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1731,IMPALA-3868: Float values are not parsed correctly .. Patch Set 4: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/119/ -- To view, visit http://gerrit.cloudera.org:8080/3791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e17d0f051b300a22a520ce34e276c2d4460d35e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog
Bharath Vissapragada has uploaded a new change for review. http://gerrit.cloudera.org:8080/4092 Change subject: IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog .. IMPALA-3820: Handle linkage errors while loading Java UDFs in Catalog Currently Catalog aborts at startup if the class loader loading Java UDFs throws a LinkageError (e.g., NoClassDefError). The code has been designed to catch a generic "Exception" and ignore any incompatible or erroneous UDFs but it can't handle any "Error"s. Java generally doesn't encourage to handle all types of "Error"s, however given the Catalog tries to load thirdparty udfs, it is possible that there might be missing dependencies that the Catalog is expected to handle. Testing: Added an e-e test that loads a corrupt Java UDF from Hive and restarts Catalog to make sure its up and the function count tallies. Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 --- M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/com/cloudera/impala/hive/executor/UdfExecutor.java M tests/custom_cluster/test_permanent_udfs.py M tests/test-hive-udfs/pom.xml A tests/test-hive-udfs/src/main/java/com/cloudera/impala/UnresolvedUdf.java 5 files changed, 77 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/4092/1 -- To view, visit http://gerrit.cloudera.org:8080/4092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1109614f1617caa9bb27e8c151e902aae71a6b41 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada