[ https://issues.apache.org/jira/browse/SPARK-25958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruslan Dautkhanov updated SPARK-25958: -------------------------------------- Description: Following error happens on a heavy Spark job after 4 hours of runtime.. {code} 2018-11-06 14:35:56,604 - data_vault.py - ERROR - Exited with exception: [Errno 97] Address family not supported by protocol Traceback (most recent call last): File "/home/mwincek/svn/data_vault/data_vault.py", line 64, in data_vault item.create_persistent_data() File "/home/mwincek/svn/data_vault/src/table_recipe/amf_table_recipe.py", line 53, in create_persistent_data single_obj.create_persistent_data() File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 21, in create_persistent_data main_df = self.generate_dataframe_main() File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 98, in generate_dataframe_main raw_disc_dv_df = self.get_raw_data_with_metadata_and_aggregation() File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 16, in get_raw_data_with_metadata_and_aggregation main_df = self.get_dataframe_using_binary_date_aggregation_on_dataframe(input_df=raw_disc_dv_df) File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 60, in get_dataframe_using_binary_date_aggregation_on_dataframe return_df = self.get_dataframe_from_binary_value_iteration(input_df) File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 136, in get_dataframe_from_binary_value_iteration combine_df = self.get_dataframe_from_binary_value(input_df=input_df, binary_value=count) File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 154, in get_dataframe_from_binary_value if len(results_of_filter_df.take(1)) == 0: File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 504, in take return self.limit(num).collect() File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 467, in collect return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer()))) File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/rdd.py", line 148, in _load_from_socket sock = socket.socket(af, socktype, proto) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/socket.py", line 191, in __init__ _sock = _realsocket(family, type, proto) error: [Errno 97] Address family not supported by protocol {code} Looking at the failing line in lib/spark2/python/pyspark/rdd.py, line 148: {code} def _load_from_socket(sock_info, serializer): port, auth_secret = sock_info sock = None # Support for both IPv4 and IPv6. # On most of IPv6-ready systems, IPv6 will take precedence. for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM): af, socktype, proto, canonname, sa = res sock = socket.socket(af, socktype, proto) try: sock.settimeout(15) sock.connect(sa) except socket.error: sock.close() sock = None continue break if not sock: raise Exception("could not open socket") # The RDD materialization time is unpredicable, if we set a timeout for socket reading # operation, it will very possibly fail. See SPARK-18281. sock.settimeout(None) sockfile = sock.makefile("rwb", 65536) do_server_auth(sockfile, auth_secret) # The socket will be automatically closed when garbage-collected. return serializer.load_stream(sockfile) {code} the culprint is in lib/spark2/python/pyspark/rdd.py in this lineĀ {code} socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM) {code} so the error "error: [Errno 97] *Address family* not supported by protocol" seems to be caused by socket.AF_UNSPEC third option to the socket.getaddrinfo() call. I tried to call similar socket.getaddrinfo call locally outside of PySpark and it worked fine. RHEL 7.5. was: Following error happens on a heavy Spark job after 4 hours of runtime.. {code:python} 2018-11-06 14:35:56,604 - data_vault.py - ERROR - Exited with exception: [Errno 97] Address family not supported by protocol Traceback (most recent call last): File "/home/mwincek/svn/data_vault/data_vault.py", line 64, in data_vault item.create_persistent_data() File "/home/mwincek/svn/data_vault/src/table_recipe/amf_table_recipe.py", line 53, in create_persistent_data single_obj.create_persistent_data() File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 21, in create_persistent_data main_df = self.generate_dataframe_main() File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 98, in generate_dataframe_main raw_disc_dv_df = self.get_raw_data_with_metadata_and_aggregation() File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 16, in get_raw_data_with_metadata_and_aggregation main_df = self.get_dataframe_using_binary_date_aggregation_on_dataframe(input_df=raw_disc_dv_df) File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 60, in get_dataframe_using_binary_date_aggregation_on_dataframe return_df = self.get_dataframe_from_binary_value_iteration(input_df) File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 136, in get_dataframe_from_binary_value_iteration combine_df = self.get_dataframe_from_binary_value(input_df=input_df, binary_value=count) File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 154, in get_dataframe_from_binary_value if len(results_of_filter_df.take(1)) == 0: File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 504, in take return self.limit(num).collect() File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 467, in collect return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer()))) File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/rdd.py", line 148, in _load_from_socket sock = socket.socket(af, socktype, proto) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/socket.py", line 191, in __init__ _sock = _realsocket(family, type, proto) error: [Errno 97] Address family not supported by protocol {code} Looking at the failing line in lib/spark2/python/pyspark/rdd.py, line 148: {code:python} def _load_from_socket(sock_info, serializer): port, auth_secret = sock_info sock = None # Support for both IPv4 and IPv6. # On most of IPv6-ready systems, IPv6 will take precedence. for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM): af, socktype, proto, canonname, sa = res sock = socket.socket(af, socktype, proto) try: sock.settimeout(15) sock.connect(sa) except socket.error: sock.close() sock = None continue break if not sock: raise Exception("could not open socket") # The RDD materialization time is unpredicable, if we set a timeout for socket reading # operation, it will very possibly fail. See SPARK-18281. sock.settimeout(None) sockfile = sock.makefile("rwb", 65536) do_server_auth(sockfile, auth_secret) # The socket will be automatically closed when garbage-collected. return serializer.load_stream(sockfile) {code} the culprint is in the line {code:python} socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM) {code} so the error "error: [Errno 97] *Address family* not supported by protocol" seems to be caused by socket.AF_UNSPEC third option to the socket.getaddrinfo() call. I tried to call similar socket.getaddrinfo call locally outside of PySpark and it worked fine. RHEL 7.5. > error: [Errno 97] Address family not supported by protocol in dataframe.take() > ------------------------------------------------------------------------------ > > Key: SPARK-25958 > URL: https://issues.apache.org/jira/browse/SPARK-25958 > Project: Spark > Issue Type: New Feature > Components: PySpark, Spark Core > Affects Versions: 2.3.1, 2.3.2 > Reporter: Ruslan Dautkhanov > Priority: Major > > Following error happens on a heavy Spark job after 4 hours of runtime.. > {code} > 2018-11-06 14:35:56,604 - data_vault.py - ERROR - Exited with exception: > [Errno 97] Address family not supported by protocol > Traceback (most recent call last): > File "/home/mwincek/svn/data_vault/data_vault.py", line 64, in data_vault > item.create_persistent_data() > File "/home/mwincek/svn/data_vault/src/table_recipe/amf_table_recipe.py", > line 53, in create_persistent_data > single_obj.create_persistent_data() > File > "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line > 21, in create_persistent_data > main_df = self.generate_dataframe_main() > File > "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line > 98, in generate_dataframe_main > raw_disc_dv_df = self.get_raw_data_with_metadata_and_aggregation() > File > "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", > line 16, in get_raw_data_with_metadata_and_aggregation > main_df = > self.get_dataframe_using_binary_date_aggregation_on_dataframe(input_df=raw_disc_dv_df) > File > "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", > line 60, in get_dataframe_using_binary_date_aggregation_on_dataframe > return_df = self.get_dataframe_from_binary_value_iteration(input_df) > File > "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", > line 136, in get_dataframe_from_binary_value_iteration > combine_df = self.get_dataframe_from_binary_value(input_df=input_df, > binary_value=count) > File > "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", > line 154, in get_dataframe_from_binary_value > if len(results_of_filter_df.take(1)) == 0: > File > "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", > line 504, in take > return self.limit(num).collect() > File > "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", > line 467, in collect > return list(_load_from_socket(sock_info, > BatchedSerializer(PickleSerializer()))) > File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/rdd.py", line > 148, in _load_from_socket > sock = socket.socket(af, socktype, proto) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/socket.py", line 191, in > __init__ > _sock = _realsocket(family, type, proto) > error: [Errno 97] Address family not supported by protocol > {code} > Looking at the failing line in lib/spark2/python/pyspark/rdd.py, line 148: > {code} > def _load_from_socket(sock_info, serializer): > port, auth_secret = sock_info > sock = None > # Support for both IPv4 and IPv6. > # On most of IPv6-ready systems, IPv6 will take precedence. > for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, > socket.SOCK_STREAM): > af, socktype, proto, canonname, sa = res > sock = socket.socket(af, socktype, proto) > try: > sock.settimeout(15) > sock.connect(sa) > except socket.error: > sock.close() > sock = None > continue > break > if not sock: > raise Exception("could not open socket") > # The RDD materialization time is unpredicable, if we set a timeout for > socket reading > # operation, it will very possibly fail. See SPARK-18281. > sock.settimeout(None) > sockfile = sock.makefile("rwb", 65536) > do_server_auth(sockfile, auth_secret) > # The socket will be automatically closed when garbage-collected. > return serializer.load_stream(sockfile) > {code} > the culprint is in lib/spark2/python/pyspark/rdd.py in this lineĀ > {code} > socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM) > {code} > so the error "error: [Errno 97] *Address family* not supported by protocol" > seems to be caused by socket.AF_UNSPEC third option to the > socket.getaddrinfo() call. > I tried to call similar socket.getaddrinfo call locally outside of PySpark > and it worked fine. > RHEL 7.5. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org