[issue34168] RAM consumption too high using concurrent.futures (Python 3.7 / 3.6 )

2018-07-20 Thread Dem


Dem  added the comment:

It seems that even without the as_completed call it has the same problem. 


```
# -*- coding: utf-8 -*-
import dns.resolver
import concurrent.futures
from pprint import pprint
from json import json


bucket = json.load(open('30_million_strings.json','r'))


def _dns_query(target, **kwargs):
global bucket
resolv = dns.resolver.Resolver()
resolv.timeout = kwargs['function']['dns_request_timeout']
try:
resolv.query(target + '.com', kwargs['function']['query_type'])
with open('out.txt', 'a') as f:
f.write(target + '\n')
except Exception:
pass


def run(**kwargs):
global bucket
temp_locals = locals()
pprint({k: v for k, v in temp_locals.items()})

with 
concurrent.futures.ThreadPoolExecutor(max_workers=kwargs['concurrency']['threads'])
 as executor:
for element in bucket:
executor.submit(kwargs['function']['name'], element, **kwargs)


run(function={'name': _dns_query, 'dns_request_timeout': 1, 'query_type': 'MX'},
concurrency={'threads': 15})
```

--

___
Python tracker 
<https://bugs.python.org/issue34168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34168] RAM consumption too high using concurrent.futures (Python 3.7 / 3.6 )

2018-07-20 Thread Dem


New submission from Dem :

I have a list of 30 million strings, and I want to run a dns query to all of 
them. I do not understand how this operation can get memory intensive. I would 
assume that the threads would exit after the job is done, and there is also a 
timeout of 1 minute as well ({'dns_request_timeout': 1}).

Here is a sneak peek of the machine's resources while running the script:

[![enter image description here][1]][1]

My code is as follows:

# -*- coding: utf-8 -*-
import dns.resolver
import concurrent.futures
from pprint import pprint
from json import json


bucket = json.load(open('30_million_strings.json','r'))


def _dns_query(target, **kwargs):
global bucket
resolv = dns.resolver.Resolver()
resolv.timeout = kwargs['function']['dns_request_timeout']
try:
resolv.query(target + '.com', kwargs['function']['query_type'])
with open('out.txt', 'a') as f:
f.write(target + '\n')
except Exception:
pass


def run(**kwargs):
global bucket
temp_locals = locals()
pprint({k: v for k, v in temp_locals.items()})

with 
concurrent.futures.ThreadPoolExecutor(max_workers=kwargs['concurrency']['threads'])
 as executor:
future_to_element = dict()

for element in bucket:
future = executor.submit(kwargs['function']['name'], element, 
**kwargs)
future_to_element[future] = element

for future in concurrent.futures.as_completed(future_to_element):
result = future_to_element[future]


run(function={'name': _dns_query, 'dns_request_timeout': 1, 'query_type': 
'MX'},
concurrency={'threads': 15})


  [1]: https://i.stack.imgur.com/686SW.png

--
components: Library (Lib)
messages: 322004
nosy: DemGiran
priority: normal
severity: normal
status: open
title: RAM consumption too high using concurrent.futures (Python 3.7 / 3.6 )
versions: Python 3.6, Python 3.7

___
Python tracker 
<https://bugs.python.org/issue34168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com