[jira] [Resolved] (IMPALA-7681) Support new URI scheme for ADLS Gen2
[ https://issues.apache.org/jira/browse/IMPALA-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-7681. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Support new URI scheme for ADLS Gen2 > > > Key: IMPALA-7681 > URL: https://issues.apache.org/jira/browse/IMPALA-7681 > Project: IMPALA > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Fix For: Impala 3.1.0 > > > HADOOP-15407 recently added a new FileSystem implementation called "ABFS" for > the ADLS Gen2 service. Instead of being in the hadoop-azure-datalake module, > it's in the hadoop-azure module as a replacement for WASB. > It should have pretty much the same filesystem semantics as ADLS, but URIs > are configured separately, so we'll need a new function to pick it up, even > if we treat it the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-5068) Some username mappings were not respected
[ https://issues.apache.org/jira/browse/IMPALA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Apple resolved IMPALA-5068. --- Resolution: Workaround > Some username mappings were not respected > - > > Key: IMPALA-5068 > URL: https://issues.apache.org/jira/browse/IMPALA-5068 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Jim Apple >Priority: Major > > Some users requested specific usernames and had them rejected by the import > process in favor of existing usernames with the same email address. One > example is [~alanchoi]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-5067) Sub-task order was not preserved
[ https://issues.apache.org/jira/browse/IMPALA-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Apple resolved IMPALA-5067. --- Resolution: Not A Problem > Sub-task order was not preserved > > > Key: IMPALA-5067 > URL: https://issues.apache.org/jira/browse/IMPALA-5067 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Jim Apple >Priority: Major > > Some issues had their sub-task order rearranged during import, like > IMPALA-3902 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
Vuk Ercegovac created IMPALA-7733: - Summary: TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename Key: IMPALA-7733 URL: https://issues.apache.org/jira/browse/IMPALA-7733 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.1.0 Reporter: Vuk Ercegovac I see two examples in the past two months or so where this test fails due to a rename error on S3. The test's stacktrace looks like this: {noformat} query_test/test_insert_parquet.py:112: in test_insert_parquet self.run_test_case('insert_parquet', vector, unique_database, multiple_impalad=True) common/impala_test_suite.py:408: in run_test_case result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:625: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:160: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:176: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:350: in __execute_query self.wait_for_finished(handle) beeswax/impala_beeswax.py:371: in wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery aborted:Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq TO s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq) failed, error was: s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq E Error(5): Input/output error{noformat} Since we know this happens once in a while, some ideas to deflake it: * retry * check for this specific issue... if we think its platform flakiness, then we should skip it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7590) Stress test hit inconsistent results with TPCDS-Q18A
[ https://issues.apache.org/jira/browse/IMPALA-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall resolved IMPALA-7590. Resolution: Cannot Reproduce > Stress test hit inconsistent results with TPCDS-Q18A > > > Key: IMPALA-7590 > URL: https://issues.apache.org/jira/browse/IMPALA-7590 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Michael Ho >Assignee: Thomas Tauber-Marshall >Priority: Blocker > > Recent runs of stress test in a cluster with 135 nodes resulted in > inconsistent result every now and then for TPCDS-Q18a. The scale of TPC-DS is > 1. > {noformat} > --- result_correct.txt2018-09-10 08:54:30.427603941 -0700 > +++ result_incorrect.txt 2018-09-10 17:39:59.512926323 -0700 > @@ -1,3 +1,4 @@ > +opening > /tmp/stress/instance1/data/jenkins/workspace/impala-test-stress-secure-140node/archive/result_hashes/input.txt > > +--++--+---+---++--++--+-+--+ > | i_item_id| ca_country | ca_state | ca_county | agg1 | agg2 | > agg3 | agg4 | agg5 | agg6| agg7 | > > +--++--+---+---++--++--+-+--+ > @@ -13,7 +14,7 @@ > | AABM || IN | | 67.00 | 105.60 | > 2232.51 | 74.08 | -1114.55 | 1964.50 | 1.00 | > | AABNFAAA || IN | | 40.00 | 115.76 | > 0.00 | 70.61 | -459.60 | 1933.00 | 3.00 | > | AACBBAAA || IN | | 32.00 | 37.99 | > 0.00 | 8.73 | -448.64 | 1963.00 | 3.00 | > -| AACC || IN | | 56.00 | 2.50 | > 0.00 | 0.62 | -62.72 | NULL| 4.00 | > +| AACC || IN | | 56.00 | 2.50 | > 0.00 | 0.62 | -62.72 | 38463209| 4.00 | > | AACDCAAA || IN | | 30.00 | 53.19 | > 0.00 | 17.02 | -505.80 | 1990.00 | 6.00 | > | AACFDAAA || IN | | 58.00 | 113.96 | > 0.00 | 19.37 | -2148.90 | 1974.00 | 1.00 | > | AACHEAAA || IN | | 16.00 | 19.90 | > 0.00 | 13.13 | 9.76 | 1960.00 | 3.00 | > @@ -101,4 +102,4 @@ > | AAPKBAAA || IN | | 2.00 | 65.90 | > 0.00 | 58.65 | 60.24| 1954.00 | 3.00 | > | AAPO || IN | | 92.00 | 125.36 | > 0.00 | 94.02 | 1743.40 | 1963.00 | 6.00 | > | AAPODAAA || IN | | 75.00 | 119.08 | > 0.00 | 104.79 | 4501.50 | 1981.00 | 5.00 | > -+--++--+---+---++--++--+-+--+ > \ No newline at end of file > ++--++--+---+---++--++--+-+--+ > {noformat} > The problem is not reproducible by running the query at Impala shell. > The query is TPCDS Q18a: > {noformat} > with results as > (select i_item_id, > ca_country, > ca_state, > ca_county, > cast(cs_quantity as decimal(12,2)) agg1, > cast(cs_list_price as decimal(12,2)) agg2, > cast(cs_coupon_amt as decimal(12,2)) agg3, > cast(cs_sales_price as decimal(12,2)) agg4, > cast(cs_net_profit as decimal(12,2)) agg5, > cast(c_birth_year as decimal(12,2)) agg6, > cast(cd1.cd_dep_count as decimal(12,2)) agg7 > from catalog_sales, customer_demographics cd1, customer_demographics cd2, > customer, customer_address, date_dim, item > where cs_sold_date_sk = d_date_sk and >cs_item_sk = i_item_sk and >cs_bill_cdemo_sk = cd1.cd_demo_sk and >cs_bill_customer_sk = c_customer_sk and >cd1.cd_gender = 'F' and >cd1.cd_education_status = 'Unknown' and >c_current_cdemo_sk = cd2.cd_demo_sk and >c_current_addr_sk = ca_address_sk and >c_birth_month in (1, 6, 8, 9, 12, 2) and >d_year = 1998 and >ca_state in ('MS', 'IN', 'ND', 'OK', 'NM', 'VA', 'MS') > ) > select i_item_id, ca_country, ca_state, ca_county, agg1, agg2, agg3, agg4, > agg5, agg6, agg7 > from ( > select i_item_id, ca_country, ca_state, ca_county, avg(agg1) agg1, > avg(agg2) agg2, avg(agg3) agg3, avg(agg4) agg4, avg(agg5) agg5, avg(agg6) > agg6, avg(agg7) agg7 > from results > group by i_item_id, ca_country, ca_state, ca_county > union all > select i_item_id, ca_country, ca_state, NULL as county, avg(agg1) agg1, > avg(agg2) agg2, avg(agg3) agg3, > a
[jira] [Created] (IMPALA-7732) Check / Implement resource limits documented in IMPALA-5605
Michael Ho created IMPALA-7732: -- Summary: Check / Implement resource limits documented in IMPALA-5605 Key: IMPALA-7732 URL: https://issues.apache.org/jira/browse/IMPALA-7732 Project: IMPALA Issue Type: Task Components: Backend Affects Versions: Impala 2.12.0, Impala 3.0 Reporter: Michael Ho IMPALA-5605 documents a list of recommended bump in system resource limits which may be necessary when running Impala at scale. We may consider checking those limits at startup with {{getrlimit()}} and potentially setting them with {{setrlimit()}} if possible. At the minimum, may be helpful to log a warning message if the limit is below certain threshold. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-5835) Severe slowdown in catalogd startup after 2.1 → 2.5 upgrade with > 200,000 databases
[ https://issues.apache.org/jira/browse/IMPALA-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v resolved IMPALA-5835. --- Resolution: Cannot Reproduce Fix Version/s: Not Applicable Please try out the latest bits and reopen if needed. > Severe slowdown in catalogd startup after 2.1 → 2.5 upgrade with > 200,000 > databases > > > Key: IMPALA-5835 > URL: https://issues.apache.org/jira/browse/IMPALA-5835 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.5.5, Impala 2.8.0, > Impala 2.9.0, Impala 2.10.0 >Reporter: Ben Breakstone >Assignee: bharath v >Priority: Major > Labels: performance > Fix For: Not Applicable > > > After an upgrade from Impala 2.1 (CDH 5.3.9) to Impala 2.5 (CDH 5.7.5), > starting up Catalog Server takes around eight to ten hours. It took around > twenty minutes before the upgrade. > There are over 200,000 databases in use. Looking in the catalogd log as it > starts up for hours, it says > "Loading native functions for database..." and then > "Loading Java functions for database..." for each database. Based on this, it > appears the introduction of persistent UDFs/UDAs is causing the slowdown. > Only one of the databases actually has any UDFs defined. > num_metadata_loading_threads is set to 64. Background loading of metadata is > disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6026) Refresh table failed with "UnsupportedOperationException: null"
[ https://issues.apache.org/jira/browse/IMPALA-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v resolved IMPALA-6026. --- Resolution: Cannot Reproduce Fix Version/s: Not Applicable > Refresh table failed with "UnsupportedOperationException: null" > > > Key: IMPALA-6026 > URL: https://issues.apache.org/jira/browse/IMPALA-6026 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.9.0 >Reporter: Juan Yu >Assignee: bharath v >Priority: Major > Fix For: Not Applicable > > > Invalidate metadata ts_part_200; > then >refresh ts_part_200; > it failed with following error. > The table has ~1.8K partitions, one file per partition. and it's an s3 table. > Note that each partition is under a different location, not all under the > same directory. > {code} > I1006 20:05:57.034777 20373 TableLoader.java:97] Loaded metadata for: > default.ts_part_200 > I1006 20:05:57.035403 6470 jni-util.cc:176] > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: ts_part_200 > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1091) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1019) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:80) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:237) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:234) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.UnsupportedOperationException > at > com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:91) > at > org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(HdfsTable.java:386) > at > org.apache.impala.catalog.HdfsTable.loadBlockMetadata(HdfsTable.java:297) > at > org.apache.impala.catalog.HdfsTable.loadMetadataAndDiskIds(HdfsTable.java:771) > at > org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:689) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1082) > ... 8 more > I1006 20:05:57.037637 6470 status.cc:122] TableLoadingException: Failed to > load metadata for table: ts_part_200 > CAUSED BY: UnsupportedOperationException: null > @ 0x83d879 impala::Status::Status() > @ 0xb98610 impala::JniUtil::GetJniExceptionMsg() > @ 0x8302eb impala::Catalog::ResetMetadata() > @ 0x82366b CatalogServiceThriftIf::ResetMetadata() > @ 0x8f69dd > impala::CatalogServiceProcessor::process_ResetMetadata() > @ 0x8f2b39 impala::CatalogServiceProcessor::dispatchCall() > @ 0x80e08c apache::thrift::TDispatchProcessor::process() > @ 0xa0124f > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x9fb939 impala::ThriftThread::RunRunnable() > @ 0x9fc392 > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0xbef309 impala::Thread::SuperviseThread() > @ 0xbefcc4 boost::detail::thread_data<>::run() > @ 0xe5810a (unknown) > @ 0x30e5c07aa1 (unknown) > @ 0x30e58e8bcd (unknown) > E1006 20:05:57.037645 6470 catalog-server.cc:82] TableLoadingException: > Failed to load metadata for table: ts_part_200 > CAUSED BY: UnsupportedOperationException: null > I1006 20:05:57.041628 6470 catalog-server.cc:86] ResetMetadata(): > response=TResetMetadataResponse { > 01: result (struct) = TCatalogUpdateResult { > 01: catalog_service_id (struct) = TUniqueId { > 01: hi (i64) = 0, > 02: lo (i64) = 0, > }, > 02: version (i64) = 0, > 03: status (struct) = TStatus { > 01: status_code (i32) = 2, > 02: error_msgs (list) = list[1] { > [0] = "TableLoadingException: Failed to load metadata for table: > ts_part_200\nCAUSED BY: UnsupportedOperationException: null", > }, > }, > }, > } > I1006 20:05:57.041652 6470 rpc-trace.cc:200] RPC call: > catalog-server:CatalogService.ResetMetadata from 10.0.0.240:55490 took > 30s715ms > I1006 20:05:57.346750 4595 rpc-trace.cc:190] RPC call: > StatestoreSubscriber.Heartbeat(from 10.0.0.200:52383) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7708) Switch to faster compression strategy for incremental stats
[ https://issues.apache.org/jira/browse/IMPALA-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v resolved IMPALA-7708. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Switch to faster compression strategy for incremental stats > --- > > Key: IMPALA-7708 > URL: https://issues.apache.org/jira/browse/IMPALA-7708 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.1.0 >Reporter: bharath v >Assignee: bharath v >Priority: Major > Fix For: Impala 3.1.0 > > > Currently we set the Deflater mode to BEST_COMPRESSION by default. > {noformat} > public static byte[] deflateCompress(byte[] input) { > if (input == null) return null; > ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length); > // TODO: Benchmark other compression levels. > DeflaterOutputStream stream = > new DeflaterOutputStream(bos, new > Deflater(Deflater.BEST_COMPRESSION)); > {noformat} > In some experiments, we noticed that the fastest compression mode > (BEST_SPEED) performs ~8x faster with only ~4% compression ratio penalty. > Here are some results on a real world table with 3000 partitions with > incremental stats. > > | |Time taken for serialization (seconds)|OutputBytes size (MB)| > |Gzip best compression|92|194| > |Gzip fastest compression|11|212| > |Gzip default compression|57|195| > |No compression|5|452| > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7689) Improve size estimate for incremental stats
[ https://issues.apache.org/jira/browse/IMPALA-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v resolved IMPALA-7689. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Improve size estimate for incremental stats > --- > > Key: IMPALA-7689 > URL: https://issues.apache.org/jira/browse/IMPALA-7689 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.1.0 >Reporter: Vuk Ercegovac >Assignee: bharath v >Priority: Major > Fix For: Impala 3.1.0 > > > After compressing incremental stats, their size estimate is not too > conservative. We use that size estimate to block the functionality (see the > corresponding expr in analysis and serialization in catalogd), so without > adjusting the estimate, the functionality will be blocked unnecessarily. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7669) Concurrent invalidate with compute (or drop) stats throws NPE.
[ https://issues.apache.org/jira/browse/IMPALA-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v resolved IMPALA-7669. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Concurrent invalidate with compute (or drop) stats throws NPE. > -- > > Key: IMPALA-7669 > URL: https://issues.apache.org/jira/browse/IMPALA-7669 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.1.0 >Reporter: bharath v >Assignee: bharath v >Priority: Critical > Fix For: Impala 3.1.0 > > > *This is a Catalog V2 only bug* > NPE is thrown when trying to getPartialInfo() from an IncompleteTable (result > of ivalidate) and cause_ is null. > {noformat} > @Override > public TGetPartialCatalogObjectResponse getPartialInfo( > TGetPartialCatalogObjectRequest req) throws TableLoadingException { > Throwables.propagateIfPossible(cause_, TableLoadingException.class); > throw new TableLoadingException(cause_.getMessage()); <- > } > {noformat} > {noformat} > I1004 16:51:28.845305 85380 jni-util.cc:308] java.lang.NullPointerException > at > org.apache.impala.catalog.IncompleteTable.getPartialInfo(IncompleteTable.java:140) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:2171) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:236) > {noformat} > Actual caller stack trace is this. > {noformat} > I1004 16:51:21.666422 67179 Frontend.java:1086] Analyzing query: compute > stats ads > I1004 16:51:28.850023 67179 jni-util.cc:308] > org.apache.impala.catalog.local.LocalCatalogException: Could not load table > parnal.ads from metastore > at > org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:128) > at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:89) > at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:119) > at > org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:251) > at > org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:140) > at > org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:116) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1118) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1092) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1064) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:158) > Caused by: org.apache.thrift.TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:354) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:163) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$5.call(CatalogdMetaProvider.java:565) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$5.call(CatalogdMetaProvider.java:560) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:411) > at > com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:407) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadTable(CatalogdMetaProvider.java:556) > at > org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:126) > ... 9 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (IMPALA-7723) Recognize int64 timestamps in CREATE TABLE LIKE PARQUET
[ https://issues.apache.org/jira/browse/IMPALA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer closed IMPALA-7723. --- Resolution: Invalid > Recognize int64 timestamps in CREATE TABLE LIKE PARQUET > --- > > Key: IMPALA-7723 > URL: https://issues.apache.org/jira/browse/IMPALA-7723 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Csaba Ringhofer >Priority: Minor > Labels: parquet > > IMPALA-5050 adds support for reading int64 encoded Parquet timestamps. These > columns have int64 physical type, and converted/logical types has to be used > to differentiate them from BIGINTs. These columns can be read both as BIGINTs > and TIMESTAMPs depending on the table's schema. > CREATE TABLE LIKE PARQUET could also convert these columns to TIMESTAMP > instead of BIGINT, but I decided to postpone adding this feature for two > reasons: > 1. It could break the following possible workflow: > - generate Parquet files (that contain int64 timestamps) with some tool > - use Impala's CREATE TABLE LIKE PARQUET + LOAD DATA to make it accessible as > a table > - run some queries that rely on interpreting these columns as integers > CAST (col as BIGINT) in the query would make this even worse, as it would > convert timestamp to unix time in seconds instead of micros/millis without > any warning. > 2. Adding support for int64 timestamps with nanoseconds precision will need > Impala's parquet-hadoop-bundle dependency to be bumped to a new major > version, which may contain incompatible API changes. > Note that parquet-hadoop-bundle is only used in CREATE TABLE LIKE PARQUET. > The C++ parts of Impala only rely on parquet.thrift, which can be updated > more easily. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7717) Partition id does not exist exception - Catalog V2
[ https://issues.apache.org/jira/browse/IMPALA-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v resolved IMPALA-7717. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Partition id does not exist exception - Catalog V2 > -- > > Key: IMPALA-7717 > URL: https://issues.apache.org/jira/browse/IMPALA-7717 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: bharath v >Assignee: bharath v >Priority: Critical > Fix For: Impala 3.1.0 > > Attachments: IMPALA-7717-repro.patch > > > Concurrent invalidates with partial RPC on partitioned tables can throw this > exception. > {noformat} > I1016 15:49:03.438048 30197 jni-util.cc:256] > java.lang.IllegalArgumentException: Partition id 162 does not exist > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:119) > at org.apache.impala.catalog.HdfsTable.getPartialInfo(HdfsTable.java:1711) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:2202) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:2141) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:237) > I1016 15:49:03.440939 30197 status.cc:129] IllegalArgumentException: > Partition id 162 does not exist > {noformat} > {noformat} > @Override > public TGetPartialCatalogObjectResponse getPartialInfo( > TGetPartialCatalogObjectRequest req) throws TableLoadingException { > > if (partIds != null) { > resp.table_info.partitions = > Lists.newArrayListWithCapacity(partIds.size()); > for (long partId : partIds) { > HdfsPartition part = partitionMap_.get(partId); > Preconditions.checkArgument(part != null, "Partition id %s does not > exist", > partId); < > {noformat} > The issue is that the invalidate command can reset the partition IDs and the > RPCs could look up with older IDs. > We should wrap this into an inconsistent metadata fetch exception and retry > rather than throwing a RTE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7731) Add ration between scanned and transmitted bytes to fragment instances
Lars Volker created IMPALA-7731: --- Summary: Add ration between scanned and transmitted bytes to fragment instances Key: IMPALA-7731 URL: https://issues.apache.org/jira/browse/IMPALA-7731 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 3.1.0 Reporter: Lars Volker Attachments: Selective Scan Slowdowns.png Selective scans (and by extension selective fragment instances) take higher performance hits when reading data remotely. They can be identified by a low ratio between data being transmitted vs data being read from HDFS. To make it easier to spot those instances we should add this ratio to each instance and to the root of the execution profile. !Selective Scan Slowdowns.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7730) Improve ORC File Format Timezone issues
Philip Zeyliger created IMPALA-7730: --- Summary: Improve ORC File Format Timezone issues Key: IMPALA-7730 URL: https://issues.apache.org/jira/browse/IMPALA-7730 Project: IMPALA Issue Type: Task Components: Backend Affects Versions: Impala 3.0 Reporter: Philip Zeyliger As pointed out in https://gerrit.cloudera.org/#/c/11731 by [~csringhofer], our support for the ORC file format doesn't follow the same timezone conventions as the rest of Impala. {quote} tldr: ORC's timezone handling is likely to be broken in Impala so we should patch it in the toolchain The ORC library implements its own IANA timezone handling to convert stored timestamps from UTC to local time + do something similar for min/max stats. The writer's timezone can be also stored in .orc files and used instead of local timezone. Impala's and ORC library's timezone can be different because of several reasons: ORC's timezone is not overridden by env var TZ and query option timezone ORC uses a simpler way to detect the local timezone which may not work on some Linux distros (see TimezoneDatabase::LocalZoneName in Impala vs LOCAL_TIMEZONE in Orc) .orc files can use any time zone as writer's timezone and we cannot be sure that it will exist on the reader machine My suggestion is to patch the ORC library in the toolchain and remove timezone handling (e.g. by always using UTC, maybe depending on a flag), as the way it is currently working is likely to be broken and is surely not consistent with the rest of Impala. I am not sure how timezones could be handled correctly in Orc + Impala. If someone plans to work on it, I would gladly help in the integration to Impala. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)