[jira] [Resolved] (IMPALA-7681) Support new URI scheme for ADLS Gen2

2018-10-19 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7681.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Support new URI scheme for ADLS Gen2
> 
>
> Key: IMPALA-7681
> URL: https://issues.apache.org/jira/browse/IMPALA-7681
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> HADOOP-15407 recently added a new FileSystem implementation called "ABFS" for 
> the ADLS Gen2 service. Instead of being in the hadoop-azure-datalake module, 
> it's in the hadoop-azure module as a replacement for WASB.
> It should have pretty much the same filesystem semantics as ADLS, but URIs 
> are configured separately, so we'll need a new function to pick it up, even 
> if we treat it the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-5068) Some username mappings were not respected

2018-10-19 Thread Jim Apple (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple resolved IMPALA-5068.
---
Resolution: Workaround

> Some username mappings were not respected
> -
>
> Key: IMPALA-5068
> URL: https://issues.apache.org/jira/browse/IMPALA-5068
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Jim Apple
>Priority: Major
>
> Some users requested specific usernames and had them rejected by the import 
> process in favor of existing usernames with the same email address. One 
> example is [~alanchoi].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-5067) Sub-task order was not preserved

2018-10-19 Thread Jim Apple (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple resolved IMPALA-5067.
---
Resolution: Not A Problem

> Sub-task order was not preserved
> 
>
> Key: IMPALA-5067
> URL: https://issues.apache.org/jira/browse/IMPALA-5067
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Jim Apple
>Priority: Major
>
> Some issues had their sub-task order rearranged during import, like 
> IMPALA-3902



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2018-10-19 Thread Vuk Ercegovac (JIRA)
Vuk Ercegovac created IMPALA-7733:
-

 Summary: TestInsertParquetQueries.test_insert_parquet is flaky in 
S3 due to rename
 Key: IMPALA-7733
 URL: https://issues.apache.org/jira/browse/IMPALA-7733
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.1.0
Reporter: Vuk Ercegovac


I see two examples in the past two months or so where this test fails due to a 
rename error on S3. The test's stacktrace looks like this:
{noformat}
query_test/test_insert_parquet.py:112: in test_insert_parquet
self.run_test_case('insert_parquet', vector, unique_database, 
multiple_impalad=True)
common/impala_test_suite.py:408: in run_test_case
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:625: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:160: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:176: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:350: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:371: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
Hdfs op (RENAME 
s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
 TO 
s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
 failed, error was: 
s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
E   Error(5): Input/output error{noformat}
Since we know this happens once in a while, some ideas to deflake it:
 * retry
 * check for this specific issue... if we think its platform flakiness, then we 
should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7590) Stress test hit inconsistent results with TPCDS-Q18A

2018-10-19 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-7590.

Resolution: Cannot Reproduce

> Stress test hit inconsistent results with TPCDS-Q18A
> 
>
> Key: IMPALA-7590
> URL: https://issues.apache.org/jira/browse/IMPALA-7590
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>
> Recent runs of stress test in a cluster with 135 nodes resulted in 
> inconsistent result every now and then for TPCDS-Q18a. The scale of TPC-DS is 
> 1.
> {noformat}
> --- result_correct.txt2018-09-10 08:54:30.427603941 -0700
> +++ result_incorrect.txt  2018-09-10 17:39:59.512926323 -0700
> @@ -1,3 +1,4 @@
> +opening 
> /tmp/stress/instance1/data/jenkins/workspace/impala-test-stress-secure-140node/archive/result_hashes/input.txt
>  
> +--++--+---+---++--++--+-+--+
>  | i_item_id| ca_country | ca_state | ca_county | agg1  | agg2   | 
> agg3 | agg4   | agg5 | agg6| agg7 |
>  
> +--++--+---+---++--++--+-+--+
> @@ -13,7 +14,7 @@
>  | AABM || IN   |   | 67.00 | 105.60 | 
> 2232.51  | 74.08  | -1114.55 | 1964.50 | 1.00 |
>  | AABNFAAA || IN   |   | 40.00 | 115.76 | 
> 0.00 | 70.61  | -459.60  | 1933.00 | 3.00 |
>  | AACBBAAA || IN   |   | 32.00 | 37.99  | 
> 0.00 | 8.73   | -448.64  | 1963.00 | 3.00 |
> -| AACC || IN   |   | 56.00 | 2.50   | 
> 0.00 | 0.62   | -62.72   | NULL| 4.00 |
> +| AACC || IN   |   | 56.00 | 2.50   | 
> 0.00 | 0.62   | -62.72   | 38463209| 4.00 |
>  | AACDCAAA || IN   |   | 30.00 | 53.19  | 
> 0.00 | 17.02  | -505.80  | 1990.00 | 6.00 |
>  | AACFDAAA || IN   |   | 58.00 | 113.96 | 
> 0.00 | 19.37  | -2148.90 | 1974.00 | 1.00 |
>  | AACHEAAA || IN   |   | 16.00 | 19.90  | 
> 0.00 | 13.13  | 9.76 | 1960.00 | 3.00 |
> @@ -101,4 +102,4 @@
>  | AAPKBAAA || IN   |   | 2.00  | 65.90  | 
> 0.00 | 58.65  | 60.24| 1954.00 | 3.00 |
>  | AAPO || IN   |   | 92.00 | 125.36 | 
> 0.00 | 94.02  | 1743.40  | 1963.00 | 6.00 |
>  | AAPODAAA || IN   |   | 75.00 | 119.08 | 
> 0.00 | 104.79 | 4501.50  | 1981.00 | 5.00 |
> -+--++--+---+---++--++--+-+--+
> \ No newline at end of file
> ++--++--+---+---++--++--+-+--+
> {noformat}
> The problem is not reproducible by running the query at Impala shell.
> The query is TPCDS Q18a:
> {noformat}
> with results as
>  (select i_item_id,
> ca_country,
> ca_state,
> ca_county,
> cast(cs_quantity as decimal(12,2)) agg1,
> cast(cs_list_price as decimal(12,2)) agg2,
> cast(cs_coupon_amt as decimal(12,2)) agg3,
> cast(cs_sales_price as decimal(12,2)) agg4,
> cast(cs_net_profit as decimal(12,2)) agg5,
> cast(c_birth_year as decimal(12,2)) agg6,
> cast(cd1.cd_dep_count as decimal(12,2)) agg7
>  from catalog_sales, customer_demographics cd1, customer_demographics cd2, 
> customer, customer_address, date_dim, item
>  where cs_sold_date_sk = d_date_sk and
>cs_item_sk = i_item_sk and
>cs_bill_cdemo_sk = cd1.cd_demo_sk and
>cs_bill_customer_sk = c_customer_sk and
>cd1.cd_gender = 'F' and
>cd1.cd_education_status = 'Unknown' and
>c_current_cdemo_sk = cd2.cd_demo_sk and
>c_current_addr_sk = ca_address_sk and
>c_birth_month in (1, 6, 8, 9, 12, 2) and
>d_year = 1998 and
>ca_state in ('MS', 'IN', 'ND', 'OK', 'NM', 'VA', 'MS')
>  )
>   select  i_item_id, ca_country, ca_state, ca_county, agg1, agg2, agg3, agg4, 
> agg5, agg6, agg7
>  from (
>   select i_item_id, ca_country, ca_state, ca_county, avg(agg1) agg1,
> avg(agg2) agg2, avg(agg3) agg3, avg(agg4) agg4, avg(agg5) agg5, avg(agg6) 
> agg6, avg(agg7) agg7
>   from results
>   group by i_item_id, ca_country, ca_state, ca_county
>   union all
>   select i_item_id, ca_country, ca_state, NULL as county, avg(agg1) agg1, 
> avg(agg2) agg2, avg(agg3) agg3,
> a

[jira] [Created] (IMPALA-7732) Check / Implement resource limits documented in IMPALA-5605

2018-10-19 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-7732:
--

 Summary: Check / Implement resource limits documented in 
IMPALA-5605
 Key: IMPALA-7732
 URL: https://issues.apache.org/jira/browse/IMPALA-7732
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 2.12.0, Impala 3.0
Reporter: Michael Ho


IMPALA-5605 documents a list of recommended bump in system resource limits 
which may be necessary when running Impala at scale. We may consider checking 
those limits at startup with {{getrlimit()}} and potentially setting them with 
{{setrlimit()}} if possible. At the minimum, may be helpful to log a warning 
message if the limit is below certain threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-5835) Severe slowdown in catalogd startup after 2.1 → 2.5 upgrade with > 200,000 databases

2018-10-19 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-5835.
---
   Resolution: Cannot Reproduce
Fix Version/s: Not Applicable

Please try out the latest bits and reopen if needed.

> Severe slowdown in catalogd startup after 2.1 → 2.5 upgrade with > 200,000 
> databases
> 
>
> Key: IMPALA-5835
> URL: https://issues.apache.org/jira/browse/IMPALA-5835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.5.5, Impala 2.8.0, 
> Impala 2.9.0, Impala 2.10.0
>Reporter: Ben Breakstone
>Assignee: bharath v
>Priority: Major
>  Labels: performance
> Fix For: Not Applicable
>
>
> After an upgrade from Impala 2.1 (CDH 5.3.9) to Impala 2.5 (CDH 5.7.5), 
> starting up Catalog Server takes around eight to ten hours. It took around 
> twenty minutes before the upgrade. 
> There are over 200,000 databases in use. Looking in the catalogd log as it 
> starts up for hours, it says 
> "Loading native functions for database..." and then 
> "Loading Java functions for database..." for each database. Based on this, it 
> appears the introduction of persistent UDFs/UDAs is causing the slowdown. 
> Only one of the databases actually has any UDFs defined. 
> num_metadata_loading_threads is set to 64. Background loading of metadata is 
> disabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6026) Refresh table failed with "UnsupportedOperationException: null"

2018-10-19 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-6026.
---
   Resolution: Cannot Reproduce
Fix Version/s: Not Applicable

> Refresh table failed with "UnsupportedOperationException: null" 
> 
>
> Key: IMPALA-6026
> URL: https://issues.apache.org/jira/browse/IMPALA-6026
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.9.0
>Reporter: Juan Yu
>Assignee: bharath v
>Priority: Major
> Fix For: Not Applicable
>
>
> Invalidate metadata ts_part_200;
> then 
>refresh ts_part_200;
> it failed with following error.
> The table has ~1.8K partitions, one file per partition. and it's an s3 table.
> Note that each partition is under a different location, not all under the 
> same directory.
> {code}
> I1006 20:05:57.034777 20373 TableLoader.java:97] Loaded metadata for: 
> default.ts_part_200
> I1006 20:05:57.035403  6470 jni-util.cc:176] 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: ts_part_200
>   at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1091)
>   at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1019)
>   at org.apache.impala.catalog.TableLoader.load(TableLoader.java:80)
>   at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:237)
>   at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:234)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.UnsupportedOperationException
>   at 
> com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:91)
>   at 
> org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(HdfsTable.java:386)
>   at 
> org.apache.impala.catalog.HdfsTable.loadBlockMetadata(HdfsTable.java:297)
>   at 
> org.apache.impala.catalog.HdfsTable.loadMetadataAndDiskIds(HdfsTable.java:771)
>   at 
> org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:689)
>   at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1082)
>   ... 8 more
> I1006 20:05:57.037637  6470 status.cc:122] TableLoadingException: Failed to 
> load metadata for table: ts_part_200
> CAUSED BY: UnsupportedOperationException: null
> @   0x83d879  impala::Status::Status()
> @   0xb98610  impala::JniUtil::GetJniExceptionMsg()
> @   0x8302eb  impala::Catalog::ResetMetadata()
> @   0x82366b  CatalogServiceThriftIf::ResetMetadata()
> @   0x8f69dd  
> impala::CatalogServiceProcessor::process_ResetMetadata()
> @   0x8f2b39  impala::CatalogServiceProcessor::dispatchCall()
> @   0x80e08c  apache::thrift::TDispatchProcessor::process()
> @   0xa0124f  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @   0x9fb939  impala::ThriftThread::RunRunnable()
> @   0x9fc392  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @   0xbef309  impala::Thread::SuperviseThread()
> @   0xbefcc4  boost::detail::thread_data<>::run()
> @   0xe5810a  (unknown)
> @   0x30e5c07aa1  (unknown)
> @   0x30e58e8bcd  (unknown)
> E1006 20:05:57.037645  6470 catalog-server.cc:82] TableLoadingException: 
> Failed to load metadata for table: ts_part_200
> CAUSED BY: UnsupportedOperationException: null
> I1006 20:05:57.041628  6470 catalog-server.cc:86] ResetMetadata(): 
> response=TResetMetadataResponse {
>   01: result (struct) = TCatalogUpdateResult {
> 01: catalog_service_id (struct) = TUniqueId {
>   01: hi (i64) = 0,
>   02: lo (i64) = 0,
> },
> 02: version (i64) = 0,
> 03: status (struct) = TStatus {
>   01: status_code (i32) = 2,
>   02: error_msgs (list) = list[1] {
> [0] = "TableLoadingException: Failed to load metadata for table: 
> ts_part_200\nCAUSED BY: UnsupportedOperationException: null",
>   },
> },
>   },
> }
> I1006 20:05:57.041652  6470 rpc-trace.cc:200] RPC call: 
> catalog-server:CatalogService.ResetMetadata from 10.0.0.240:55490 took 
> 30s715ms
> I1006 20:05:57.346750  4595 rpc-trace.cc:190] RPC call: 
> StatestoreSubscriber.Heartbeat(from 10.0.0.200:52383)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7708) Switch to faster compression strategy for incremental stats

2018-10-19 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-7708.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Switch to faster compression strategy for incremental stats
> ---
>
> Key: IMPALA-7708
> URL: https://issues.apache.org/jira/browse/IMPALA-7708
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Currently we set the Deflater mode to BEST_COMPRESSION by default.
> {noformat}
> public static byte[] deflateCompress(byte[] input) {
> if (input == null) return null;
> ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
> // TODO: Benchmark other compression levels.
> DeflaterOutputStream stream =
> new DeflaterOutputStream(bos, new 
> Deflater(Deflater.BEST_COMPRESSION));
> {noformat}
> In some experiments, we noticed that the fastest compression mode 
> (BEST_SPEED) performs ~8x faster with only ~4% compression ratio penalty. 
> Here are some results on a real world table with 3000 partitions with 
> incremental stats.
>  
> | |Time taken for serialization (seconds)|OutputBytes size (MB)|
> |Gzip best compression|92|194|
> |Gzip fastest compression|11|212|
> |Gzip default compression|57|195|
> |No compression|5|452|
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7689) Improve size estimate for incremental stats

2018-10-19 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-7689.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Improve size estimate for incremental stats
> ---
>
> Key: IMPALA-7689
> URL: https://issues.apache.org/jira/browse/IMPALA-7689
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: bharath v
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> After compressing incremental stats, their size estimate is not too 
> conservative. We use that size estimate to block the functionality (see the 
> corresponding expr in analysis and serialization in catalogd), so without 
> adjusting the estimate, the functionality will be blocked unnecessarily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7669) Concurrent invalidate with compute (or drop) stats throws NPE.

2018-10-19 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-7669.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Concurrent invalidate with compute (or drop) stats throws NPE.
> --
>
> Key: IMPALA-7669
> URL: https://issues.apache.org/jira/browse/IMPALA-7669
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> *This is a Catalog V2 only bug*
> NPE is thrown when trying to getPartialInfo() from an IncompleteTable (result 
> of ivalidate) and cause_ is null.
> {noformat}
> @Override
>   public TGetPartialCatalogObjectResponse getPartialInfo(
>   TGetPartialCatalogObjectRequest req) throws TableLoadingException {
> Throwables.propagateIfPossible(cause_, TableLoadingException.class);
> throw new TableLoadingException(cause_.getMessage());  <-
>   }
> {noformat}
> {noformat}
> I1004 16:51:28.845305 85380 jni-util.cc:308] java.lang.NullPointerException
> at 
> org.apache.impala.catalog.IncompleteTable.getPartialInfo(IncompleteTable.java:140)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:2171)
> at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:236)
> {noformat}
> Actual caller stack trace is this.
> {noformat}
> I1004 16:51:21.666422 67179 Frontend.java:1086] Analyzing query: compute 
> stats ads
> I1004 16:51:28.850023 67179 jni-util.cc:308] 
> org.apache.impala.catalog.local.LocalCatalogException: Could not load table 
> parnal.ads from metastore
> at 
> org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:128)
> at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:89)
> at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:119)
> at 
> org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:251)
> at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:140)
> at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:116)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1118)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1092)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1064)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:158)
> Caused by: org.apache.thrift.TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:354)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:163)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$5.call(CatalogdMetaProvider.java:565)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$5.call(CatalogdMetaProvider.java:560)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$1.call(CatalogdMetaProvider.java:411)
> at 
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
> at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
> at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:407)
> at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadTable(CatalogdMetaProvider.java:556)
> at 
> org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:126)
> ... 9 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IMPALA-7723) Recognize int64 timestamps in CREATE TABLE LIKE PARQUET

2018-10-19 Thread Csaba Ringhofer (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer closed IMPALA-7723.
---
Resolution: Invalid

> Recognize int64 timestamps in CREATE TABLE LIKE PARQUET
> ---
>
> Key: IMPALA-7723
> URL: https://issues.apache.org/jira/browse/IMPALA-7723
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Minor
>  Labels: parquet
>
> IMPALA-5050 adds support for reading int64 encoded Parquet timestamps. These 
> columns have int64 physical type, and converted/logical types has to be used 
> to differentiate them from BIGINTs. These columns can be read both as BIGINTs 
> and TIMESTAMPs depending on the table's schema.
> CREATE TABLE LIKE PARQUET could also convert these columns to TIMESTAMP 
> instead of BIGINT, but I decided to postpone adding this feature for two 
> reasons:
> 1. It could break the following possible workflow:
> - generate Parquet files (that contain int64 timestamps) with some tool
> - use Impala's CREATE TABLE LIKE PARQUET + LOAD DATA to make it accessible as 
> a table
> - run some queries that rely on interpreting these columns as integers
> CAST (col as BIGINT) in the query would make this even worse, as it would 
> convert timestamp to unix time in seconds instead of micros/millis without 
> any warning.
> 2. Adding support for int64 timestamps with nanoseconds precision will need 
> Impala's  parquet-hadoop-bundle dependency to be bumped to a new major 
> version, which may contain incompatible API changes.
> Note that parquet-hadoop-bundle is only used in CREATE TABLE LIKE PARQUET. 
> The C++ parts of Impala only rely on parquet.thrift, which can be updated 
> more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7717) Partition id does not exist exception - Catalog V2

2018-10-19 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-7717.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Partition id does not exist exception - Catalog V2
> --
>
> Key: IMPALA-7717
> URL: https://issues.apache.org/jira/browse/IMPALA-7717
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: bharath v
>Assignee: bharath v
>Priority: Critical
> Fix For: Impala 3.1.0
>
> Attachments: IMPALA-7717-repro.patch
>
>
> Concurrent invalidates with partial RPC on partitioned tables can throw this 
> exception.
> {noformat}
> I1016 15:49:03.438048 30197 jni-util.cc:256] 
> java.lang.IllegalArgumentException: Partition id 162 does not exist
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:119)
>   at org.apache.impala.catalog.HdfsTable.getPartialInfo(HdfsTable.java:1711)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:2202)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:2141)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:237)
> I1016 15:49:03.440939 30197 status.cc:129] IllegalArgumentException: 
> Partition id 162 does not exist
> {noformat}
> {noformat}
>  @Override
>   public TGetPartialCatalogObjectResponse getPartialInfo(
>   TGetPartialCatalogObjectRequest req) throws TableLoadingException {
> 
> if (partIds != null) {
>   resp.table_info.partitions = 
> Lists.newArrayListWithCapacity(partIds.size());
>   for (long partId : partIds) {
> HdfsPartition part = partitionMap_.get(partId);
> Preconditions.checkArgument(part != null, "Partition id %s does not 
> exist",
> partId); <
> {noformat}
> The issue is that the invalidate command can reset the partition IDs and the 
> RPCs could look up with older IDs. 
> We should wrap this into an inconsistent metadata fetch exception and retry 
> rather than throwing a RTE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7731) Add ration between scanned and transmitted bytes to fragment instances

2018-10-19 Thread Lars Volker (JIRA)
Lars Volker created IMPALA-7731:
---

 Summary: Add ration between scanned and transmitted bytes to 
fragment instances
 Key: IMPALA-7731
 URL: https://issues.apache.org/jira/browse/IMPALA-7731
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.1.0
Reporter: Lars Volker
 Attachments: Selective Scan Slowdowns.png

Selective scans (and by extension selective fragment instances) take higher 
performance hits when reading data remotely. They can be identified by a low 
ratio between data being transmitted vs data being read from HDFS. To make it 
easier to spot those instances we should add this ratio to each instance and to 
the root of the execution profile.

 !Selective Scan Slowdowns.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7730) Improve ORC File Format Timezone issues

2018-10-19 Thread Philip Zeyliger (JIRA)
Philip Zeyliger created IMPALA-7730:
---

 Summary: Improve ORC File Format Timezone issues
 Key: IMPALA-7730
 URL: https://issues.apache.org/jira/browse/IMPALA-7730
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 3.0
Reporter: Philip Zeyliger


As pointed out in https://gerrit.cloudera.org/#/c/11731 by [~csringhofer], our 
support for the ORC file format doesn't follow the same timezone conventions as 
the rest of Impala.

{quote}
tldr: ORC's timezone handling is likely to be broken in Impala so we should 
patch it in the toolchain

The ORC library implements its own IANA timezone handling to convert stored 
timestamps from UTC to local time + do something similar for min/max stats. The 
writer's timezone can be also stored in .orc files and used instead of local 
timezone.

Impala's and ORC library's timezone can be different because of several reasons:

ORC's timezone is not overridden by env var TZ and query option timezone
ORC uses a simpler way to detect the local timezone which may not work on some 
Linux distros (see TimezoneDatabase::LocalZoneName in Impala vs LOCAL_TIMEZONE 
in Orc)
.orc files can use any time zone as writer's timezone and we cannot be sure 
that it will exist on the reader machine
My suggestion is to patch the ORC library in the toolchain and remove timezone 
handling (e.g. by always using UTC, maybe depending on a flag), as the way it 
is currently working is likely to be broken and is surely not consistent with 
the rest of Impala.

I am not sure how timezones could be handled correctly in Orc + Impala. If 
someone plans to work on it, I would gladly help in the integration to Impala.
{quote}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)