[ https://issues.apache.org/jira/browse/IMPALA-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385079#comment-17385079 ]
ASF subversion and git services commented on IMPALA-9338: --------------------------------------------------------- Commit 90906ec68c93795b39a82f339c299a6fb363808d in impala's branch refs/heads/master from Yida Wu [ https://gitbox.apache.org/repos/asf?p=impala.git;h=90906ec ] IMPALA-9338 Fix impala crashing in impala::RowDescriptor::TupleIsNullable(int) The patch fixes a bug in the function of orderConjunctsByCost, which could remove the wrong object in the list when the first conjunct is not the best and there is a same conjunct with different letter cases. It could end up to have duplicate objects after reordering the list because the conjunct, which has been added to the return list, is still in the remaining list, and lead to a wrong plan later where each side of the JOIN references columns from the other side due to a double flip on a same conjunct (There are two conjuncts in the list, and they are flipped as required by the analyzer, but unfortunately, the two conjuncts are the same object). The root cause of the issue is that some parts of the analyzer are case-sensitive, but some parts are not. For example, the remove() of the List considers the conjuncts with different letter cases are the same because they refer the same columns, while the compareTo() of the String considers the letter cases. This discrepancy creates some unexpected bugs. The fix uses the index instead of the Object to remove in the remaining list to solve the bug. However, there may still be somewhere else in our code that has similar issues regarding to different letter cases, it could be better that we have a consistent policy in SQL analyzing to avoid such bugs. Regression testcases has been added to queries/tpch-outer-joins and PlannerTest/join-order. Tests: Ran the Core FE_TEST and EE_TEST. Passed the regression test in tpch-outer-joins and PlannerTest/join-order. Change-Id: I2ba031d7a6eda21a77b0e53bc41772ee9e00a528 Reviewed-on: http://gerrit.cloudera.org:8080/17610 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Impala crashing in impala::RowDescriptor::TupleIsNullable(int) > -------------------------------------------------------------- > > Key: IMPALA-9338 > URL: https://issues.apache.org/jira/browse/IMPALA-9338 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.3.0 > Reporter: Abhishek Rawat > Assignee: Yida Wu > Priority: Blocker > Labels: crash > > Repro: > {code:java} > create database default; > CREATE EXTERNAL TABLE default.dimension ( ssn_id INT, act_num CHAR(1), eff_dt > CHAR(10), seq_num SMALLINT, entry_dt CHAR(10), map ARRAY<INT>, src CHAR(10), > msg CHAR(1), msg_num CHAR(3), remarks CHAR(3), description CHAR(26), > default_load_ts CHAR(26), map_cd VARCHAR(50) ) PARTITIONED BY ( year INT, > ssn_hash INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C' WITH > SERDEPROPERTIES ('colelction.delim'=',', 'field.delim'='\u001C', > 'serialization.format'='\u001C') STORED AS PARQUET --LOCATION > 'hdfs://prdnameservice/user/hive/warehouse/default.db/dimension' > TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', > 'STATS_GENERATED_VIA_STATS_TASK'='true', > 'impala.lastComputeStatsTime'='1579246708', 'last_modified_by'='a00811p', > 'last_modified_time'='1489791214', 'numRows'='7357715311', > 'totalSize'='235136295799'); > CREATE EXTERNAL TABLE default.fact ( ssn_id_n INT, bor_act_sfx CHAR(1), > start_dt CHAR(10), seq_num SMALLINT, msg_n CHAR(8), end_dt CHAR(10), reviews > CHAR(50), description CHAR(50), detail CHAR(50), default_load_ts CHAR(26) ) > PARTITIONED BY ( year INT, ssn_hash INT ) ROW FORMAT DELIMITED FIELDS > TERMINATED BY '\u0016' WITH SERDEPROPERTIES ('field.delim'='\u0016', > 'serialization.format'='\u0016') STORED AS PARQUET --LOCATION > 'hdfs://prdnameservice/user/hive/warehouse/default.db/fact' TBLPROPERTIES > ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', > 'STATS_GENERATED_VIA_STATS_TASK'='true', > 'impala.lastComputeStatsTime'='1579242111', 'last_modified_by'='e32940', > 'last_modified_time'='1484186332', 'numRows'='5142832439', > 'totalSize'='105397898347'); > use default; > select ssn_id_n, bor_act_sfx, amap.item, start_dt, reviews, concat(msg, > msg_num) corr_code from dimension, dimension.map amap LEFT JOIN fact ON > dimension.ssn_id = fact.ssn_id_n AND dimension.act_num = fact.bor_act_sfx AND > dimension.eff_dt = fact.start_dt and dimension.year = fact.year --and > dimension.month(cast(eff_dt as timestamp)) = fact.month(cast(start_dt as > timestamp)) AND dimension.YEAR = fact.YEAR AND fact.year in (2018,2019) where > dimension.msg like '%B295%' AND dimension.year in (2018,2019);{code} > Stack Trace: > {code:java} > #0 0x0000000000f8b1b9 in impala::RowDescriptor::TupleIsNullable(int) const () > #1 0x000000000130911f in impala::SlotRef::Init(impala::RowDescriptor const&, > impala::RuntimeState*) () > #2 0x000000000130748e in impala::ScalarExpr::Create(impala::TExpr const&, > impala::RowDescriptor const&, impala::RuntimeState*, impala::ObjectPool*, > impala::ScalarExpr**) () > #3 0x00000000013075e5 in > impala::ScalarExpr::Create(std::vector<impala::TExpr, > std::allocator<impala::TExpr> > const&, impala::RowDescriptor const&, > impala::RuntimeState*, impala::ObjectPool*, std::vector<impala::ScalarExpr*, > std::allocator<impala::ScalarExpr*> >*) () > #4 0x000000000130769f in > impala::ScalarExpr::Create(std::vector<impala::TExpr, > std::allocator<impala::TExpr> > const&, impala::RowDescriptor const&, > impala::RuntimeState*, std::vector<impala::ScalarExpr*, > std::allocator<impala::ScalarExpr*> >*) () > #5 0x000000000149c1aa in > impala::KrpcDataStreamSender::Init(std::vector<impala::TExpr, > std::allocator<impala::TExpr> > const&, impala::TDataSink const&, > impala::RuntimeState*) () > #6 0x0000000001208ad3 in impala::DataSink::Create(impala::TPlanFragmentCtx > const&, impala::TPlanFragmentInstanceCtx const&, impala::RowDescriptor > const*, impala::RuntimeState*, impala::DataSink**) () > #7 0x0000000000fac9a4 in impala::FragmentInstanceState::Prepare() () > #8 0x0000000000fad3dd in impala::FragmentInstanceState::Exec() () > #9 0x0000000000f98e77 in > impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) () > #10 0x00000000011a1490 in impala::Thread::SuperviseThread(std::string const&, > std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, > impala::Promise<long, (impala::PromiseMode)0>*) () > #11 0x00000000011a203a in boost::detail::thread_data<boost::_bi::bind_t<void, > void (std::string const&, std::string const&, boost::function<void ()>, > impala::ThreadDebugInfo const*, impala::Promise<long, > (impala::PromiseMode)0>), boost::_bi::list5<boost::_bi::value<std::string>, > boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, > boost::_bi::value<impala::ThreadDebugInfo>, > boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > > >::run() () > #12 0x00000000017909ca in thread_proxy () #13 0x00007f8832fa6aa1 in > __pthread_initialize_minimal_internal () from /lib64/libpthread.so.0 #14 > 0x0000000000000000 in ?? () > {code} > > The crash only happens when ROJ plan is selected. If, LOJ plan is selected > the query runs successfully. > Initial investigation indicates that the Scalar expression being contructed > in the above stack trace is referencing an invalid tupleId in the row > descriptor. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org