Dhruve Ashar created SPARK-26827:
------------------------------------

             Summary: Support importing python modules having shared 
objects(.so)
                 Key: SPARK-26827
                 URL: https://issues.apache.org/jira/browse/SPARK-26827
             Project: Spark
          Issue Type: New Feature
          Components: PySpark
    Affects Versions: 2.4.0, 2.3.2
            Reporter: Dhruve Ashar


If a user wants to import dynamic modules, specifically having .so files, this 
is currently disallowed by python from a zip file. 
([https://docs.python.org/3/library/zipimport.html)] and currently spark 
doesn't support this either. 

Files which are passed using py-files options are placed on the PYTHONPATH, but 
are not extracted. While files which are passed as archives, are extracted but 
not placed on the PYTHONPATH. The dynamic modules can be loaded if they are 
extracted and added to the PYTHONPATH.

 

Has anyone encountered this issue before and what is the best way to go about 
it?

 

Some possible solutions:

1 - Get around this issue, by passing the archive with py-files and archives 
option, this extracts the archive as well as adds it to the path. Gotcha - both 
have to be named the same. I have tested this and it works, but its just a 
workaround.

2 - We add a new config like py-archives which takes all the files and extracts 
them and also adds them to the PYTHONPATH. Or just examine the contents of the 
zip file and if it has dynamic modules then do the same. I am happy to work on 
the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to