[ 
https://issues.apache.org/jira/browse/IMPALA-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773419#comment-16773419
 ] 

Joe McDonnell commented on IMPALA-8178:
---------------------------------------

This is related to enabling the remote file handle cache in IMPALA-7265. At the 
moment, it looks like the JVM is allocating some native memory for each file 
handle. Testing without the file handle cache doesn't see this issue. Erasure 
coding seems to use "remote" file handles on the minicluster.

For now, it might make sense to disable the file handle cache for erasure 
coding.

This is easy to reproduce. Create an erasure coded minicluster, start up 
impala, and then run "select count(*) from tpcds_parquet.store_sales". This 
increases memory consumption by 5+GB total across the three impalads.

Using pprof, here are the top allocations:
{noformat}
Total: 3766.0 MB
  3657.3 97.1% 97.1% 3657.4 97.1% JVM_FindSignal
  48.0 1.3% 98.4% 84.7 2.2% SUNWprivate_1.1
  10.5 0.3% 98.7% 10.5 0.3% inflate
  7.0 0.2% 98.9% 7.0 0.2% __gnu_cxx::new_allocator::allocate
  6.8 0.2% 99.0% 6.8 0.2% impala::SystemAllocator::AllocateViaMalloc
  6.5 0.2% 99.2% 6.5 0.2% __gnu_cxx::new_allocator::allocate (inline)
  6.4 0.2% 99.4% 6.4 0.2% 
Java_org_apache_hadoop_io_erasurecode_rawcoder_NativeRSRawDecoder_initImpl
  3.9 0.1% 99.5% 3.9 0.1% Java_java_util_zip_ZipFile_getZipMessage{noformat}

> Tests failing with “Could not allocate memory while trying to increase 
> reservation” on EC filesystem
> ----------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8178
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8178
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Andrew Sherman
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: broken-build
>
> In tests run against an Erasure Coding filesystem, multiple tests failed with 
> memory allocation errors.
> In total 10 tests failed:
>  * query_test.test_scanners.TestParquet.test_decimal_encodings
>  * query_test.test_scanners.TestTpchScanRangeLengths.test_tpch_scan_ranges
>  * query_test.test_exprs.TestExprs.test_exprs [enable_expr_rewrites: 0]
>  * query_test.test_exprs.TestExprs.test_exprs [enable_expr_rewrites: 1]
>  * query_test.test_hbase_queries.TestHBaseQueries.test_hbase_scan_node
>  * query_test.test_scanners.TestParquet.test_def_levels
>  * 
> query_test.test_scanners.TestTextSplitDelimiters.test_text_split_across_buffers_delimiterquery_test.test_hbase_queries.TestHBaseQueries.test_hbase_filters
>  * query_test.test_hbase_queries.TestHBaseQueries.test_hbase_inline_views
>  * query_test.test_hbase_queries.TestHBaseQueries.test_hbase_top_n
> The first failure looked like this on the client side:
> {quote}
> F 
> query_test/test_scanners.py::TestParquet::()::test_decimal_encodings[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
>  query_test/test_scanners.py:717: in test_decimal_encodings
>      self.run_test_case('QueryTest/parquet-decimal-formats', vector, 
> unique_database)
>  common/impala_test_suite.py:472: in run_test_case
>      result = self.__execute_query(target_impalad_client, query, user=user)
>  common/impala_test_suite.py:699: in __execute_query
>      return impalad_client.execute(query, user=user)
>  common/impala_connection.py:174: in execute
>      return self.__beeswax_client.execute(sql_stmt, user=user)
>  beeswax/impala_beeswax.py:183: in execute
>      handle = self.__execute_query(query_string.strip(), user=user)
>  beeswax/impala_beeswax.py:360: in __execute_query
>      self.wait_for_finished(handle)
>  beeswax/impala_beeswax.py:381: in wait_for_finished
>      raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
>  E   ImpalaBeeswaxException: ImpalaBeeswaxException:
>  E    Query aborted:ExecQueryFInstances rpc 
> query_id=6e44c3c949a31be2:f973c7ff00000000 failed: Failed to get minimum 
> memory reservation of 8.00 KB on daemon xxx.com:22001 for query 
> 6e44c3c949a31be2:f973c7ff00000000 due to following error: Memory limit 
> exceeded: Could not allocate memory while trying to increase reservation.
>  E   Query(6e44c3c949a31be2:f973c7ff00000000) could not allocate 8.00 KB 
> without exceeding limit.
>  E   Error occurred on backend xxx.com:22001
>  E   Memory left in process limit: 1.19 GB
>  E   Query(6e44c3c949a31be2:f973c7ff00000000): Reservation=0 
> ReservationLimit=9.60 GB OtherMemory=0 Total=0 Peak=0
>  E   Memory is likely oversubscribed. Reducing query concurrency or 
> configuring admission control may help avoid this error.
> {quote}
> On the server side log:
> {quote}
> I0207 18:25:19.329311  5562 impala-server.cc:1063] 
> 6e44c3c949a31be2:f973c7ff00000000] Registered query 
> query_id=6e44c3c949a31be2:f973c7ff00000000 
> session_id=93497065f69e9d01:8a3bd06faff3da5
> I0207 18:25:19.329434  5562 Frontend.java:1242] 
> 6e44c3c949a31be2:f973c7ff00000000] Analyzing query: select score from 
> decimal_stored_as_int32
> I0207 18:25:19.329583  5562 FeSupport.java:285] 
> 6e44c3c949a31be2:f973c7ff00000000] Requesting prioritized load of table(s): 
> test_decimal_encodings_28d99c0e.decimal_stored_as_int32
> I0207 18:25:30.776041  5562 Frontend.java:1282] 
> 6e44c3c949a31be2:f973c7ff00000000] Analysis finished.
> I0207 18:25:35.919486 10418 admission-controller.cc:608] 
> 6e44c3c949a31be2:f973c7ff00000000] Schedule for 
> id=6e44c3c949a31be2:f973c7ff00000000 in pool_name=default-pool 
> per_host_mem_estimate=16.02 MB PoolConfig: max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I0207 18:25:35.919528 10418 admission-controller.cc:613] 
> 6e44c3c949a31be2:f973c7ff00000000] Stats: agg_num_running=2, 
> agg_num_queued=0, agg_mem_reserved=24.13 MB,  
> local_host(local_mem_admitted=1.99 GB, num_admitted_running=2, num_queued=0, 
> backend_mem_reserved=8.06 MB)
> I0207 18:25:35.919549 10418 admission-controller.cc:645] 
> 6e44c3c949a31be2:f973c7ff00000000] Admitted query 
> id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:35.920532 10418 coordinator.cc:93] 
> 6e44c3c949a31be2:f973c7ff00000000] Exec() 
> query_id=6e44c3c949a31be2:f973c7ff00000000 stmt=select score from 
> decimal_stored_as_int32
> I0207 18:25:35.930855 10418 coordinator.cc:359] 
> 6e44c3c949a31be2:f973c7ff00000000] starting execution on 2 backends for 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:35.938108 21110 impala-internal-service.cc:50] 
> 6e44c3c949a31be2:f973c7ff00000000] ExecQueryFInstances(): 
> query_id=6e44c3c949a31be2:f973c7ff00000000 coord=xxx.com:22000 #instances=1
> I0207 18:25:36.037228 10571 query-state.cc:624] 
> 6e44c3c949a31be2:f973c7ff00000000] Executing instance. 
> instance_id=6e44c3c949a31be2:f973c7ff00000000 fragment_idx=0 
> per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=5
> I0207 18:25:48.149771 12581 coordinator-backend-state.cc:209] 
> ExecQueryFInstances rpc query_id=6e44c3c949a31be2:f973c7ff00000000 failed: 
> Failed to get minimum memory reservation of 8.00 KB on daemon xxx.com:22001 
> for query 6e44c3c949a31be2:f973c7ff00000000 due to following error: Memory 
> limit exceeded: Could not allocate memory while trying to increase 
> reservation.
> Query(6e44c3c949a31be2:f973c7ff00000000) could not allocate 8.00 KB without 
> exceeding limit.
> Query(6e44c3c949a31be2:f973c7ff00000000): Reservation=0 ReservationLimit=9.60 
> GB OtherMemory=0 Total=0 Peak=0
> I0207 18:25:48.149895 10418 coordinator.cc:373] 
> 6e44c3c949a31be2:f973c7ff00000000] started execution on 2 backends for 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.152803 10418 coordinator.cc:527] 
> 6e44c3c949a31be2:f973c7ff00000000] ExecState: query 
> id=6e44c3c949a31be2:f973c7ff00000000 finstance=N/A on host=xxx.com (EXECUTING 
> -> ERROR) status=ExecQueryFInstances rpc 
> query_id=6e44c3c949a31be2:f973c7ff00000000 failed: Failed to get minimum 
> memory reservation of 8.00 KB on daemon xxx.com:22001 for query 
> 6e44c3c949a31be2:f973c7ff00000000 due to following error: Memory limit 
> exceeded: Could not allocate memory while trying to increase reservation.
> Query(6e44c3c949a31be2:f973c7ff00000000) could not allocate 8.00 KB without 
> exceeding limit.
> Query(6e44c3c949a31be2:f973c7ff00000000): Reservation=0 ReservationLimit=9.60 
> GB OtherMemory=0 Total=0 Peak=0
> I0207 18:25:48.152827 10418 coordinator-backend-state.cc:453] 
> 6e44c3c949a31be2:f973c7ff00000000] Sending CancelQueryFInstances rpc for 
> query_id=6e44c3c949a31be2:f973c7ff00000000 backend=127.0.0.1:27000
> I0207 18:25:48.155086 12737 control-service.cc:168] CancelQueryFInstances(): 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.155109 12737 query-exec-mgr.cc:97] QueryState: 
> query_id=6e44c3c949a31be2:f973c7ff00000000 refcnt=4
> I0207 18:25:48.155117 12737 query-state.cc:649] Cancel: 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.155129 12737 krpc-data-stream-mgr.cc:325] cancelling all 
> streams for fragment_instance_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.155297 10418 coordinator.cc:687] 
> 6e44c3c949a31be2:f973c7ff00000000] CancelBackends() 
> query_id=6e44c3c949a31be2:f973c7ff00000000, tried to cancel 1 backends
> I0207 18:25:48.155306 10418 coordinator.cc:859] 
> 6e44c3c949a31be2:f973c7ff00000000] Release admission control resources for 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.170018 10571 krpc-data-stream-mgr.cc:294] 
> 6e44c3c949a31be2:f973c7ff00000000] DeregisterRecvr(): 
> fragment_instance_id=6e44c3c949a31be2:f973c7ff00000000, node=1
> I0207 18:25:48.197767  5562 impala-beeswax-server.cc:239] close(): 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.197775  5562 impala-server.cc:1142] UnregisterQuery(): 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.197779  5562 impala-server.cc:1249] Cancel(): 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.225905 10529 query-state.cc:272] 
> 6e44c3c949a31be2:f973c7ff00000000] UpdateBackendExecState(): last report for 
> 6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.225889 10571 query-state.cc:632] 
> 6e44c3c949a31be2:f973c7ff00000000] Instance completed. 
> instance_id=6e44c3c949a31be2:f973c7ff00000000 #in-flight=4 status=CANCELLED: 
> Cancelled
> I0207 18:25:48.372977 12737 control-service.cc:125] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 6e44c3c949a31be2:f973c7ff00000000 remote host=127.0.0.1:50422
> I0207 18:25:48.373118 10529 query-state.cc:431] 
> 6e44c3c949a31be2:f973c7ff00000000] Cancelling fragment instances as directed 
> by the coordinator. Returned status: ReportExecStatus(): Received report for 
> unknown query ID (probably closed or cancelled): 
> 6e44c3c949a31be2:f973c7ff00000000 remote host=127.0.0.1:50422
> I0207 18:25:48.373138 10529 query-state.cc:649] 
> 6e44c3c949a31be2:f973c7ff00000000] Cancel: 
> query_id=6e44c3c949a31be2:f973c7ff00000000
> I0207 18:25:48.429422  5562 query-exec-mgr.cc:184] ReleaseQueryState(): 
> deleted query_id=6e44c3c949a31be2:f973c7ff00000000
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to