[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module
[ https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274333#comment-14274333 ] Apache Spark commented on SPARK-4348: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/4011 > pyspark.mllib.random conflicts with random module > - > > Key: SPARK-4348 > URL: https://issues.apache.org/jira/browse/SPARK-4348 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 1.1.0, 1.2.0 >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Blocker > Fix For: 1.2.0 > > > There are conflict in two cases: > 1. random module is used by pyspark.mllib.feature, if the first part of > sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the > conflict. > 2. Run tests in mllib/xxx.py, the '' should be popped out before import > anything, or it will fail. > The first one is not fully fixed for user, it will introduce problems in some > cases, such as: > {code} > >>> import sys > >>> import sys.insert(0, PATH_OF_MODULE) > >>> import pyspark > >>> # use Word2Vec will fail > {code} > I'd like to rename mllib/random.py as random/_random.py, then in > mllib/__init.py > {code} > import pyspark.mllib._random as random > {code} > cc [~mengxr] [~dorx] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module
[ https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211768#comment-14211768 ] Xiangrui Meng commented on SPARK-4348: -- Note that after this fix, it is very likely that the bytecode file `random.pyc` still sits under `pyspark/mllib`. We need to remove it manually to prevent "import random" taking that file. > pyspark.mllib.random conflicts with random module > - > > Key: SPARK-4348 > URL: https://issues.apache.org/jira/browse/SPARK-4348 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 1.1.0, 1.2.0 >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Blocker > Fix For: 1.2.0 > > > There are conflict in two cases: > 1. random module is used by pyspark.mllib.feature, if the first part of > sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the > conflict. > 2. Run tests in mllib/xxx.py, the '' should be popped out before import > anything, or it will fail. > The first one is not fully fixed for user, it will introduce problems in some > cases, such as: > {code} > >>> import sys > >>> import sys.insert(0, PATH_OF_MODULE) > >>> import pyspark > >>> # use Word2Vec will fail > {code} > I'd like to rename mllib/random.py as random/_random.py, then in > mllib/__init.py > {code} > import pyspark.mllib._random as random > {code} > cc [~mengxr] [~dorx] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module
[ https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207529#comment-14207529 ] Davies Liu commented on SPARK-4348: --- After some experiments, I found it's more harder than expected, it still need some hack to make it work (see the PR), but I think this hack is safer than before: 1. the rand.py module will not overwrite default random module, so it's safe to run the mllib/xxx.py without hacking, also we do not need hack to use random in mllib package. 2. the RandomModuleHook only installed when user try to import 'pyspark.mllib', it also only works for 'pyspark.mllib.random'. Note: In order to use default random module, we need 'from __future__ import absolute_import' in the caller module, this also need as more. Without this, 'import random' can be translated as 'from pyspark.mllib import random'. So, there is a bug in master (Word2Vec) > pyspark.mllib.random conflicts with random module > - > > Key: SPARK-4348 > URL: https://issues.apache.org/jira/browse/SPARK-4348 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 1.1.0, 1.2.0 >Reporter: Davies Liu >Priority: Blocker > > There are conflict in two cases: > 1. random module is used by pyspark.mllib.feature, if the first part of > sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the > conflict. > 2. Run tests in mllib/xxx.py, the '' should be popped out before import > anything, or it will fail. > The first one is not fully fixed for user, it will introduce problems in some > cases, such as: > {code} > >>> import sys > >>> import sys.insert(0, PATH_OF_MODULE) > >>> import pyspark > >>> # use Word2Vec will fail > {code} > I'd like to rename mllib/random.py as random/_random.py, then in > mllib/__init.py > {code} > import pyspark.mllib._random as random > {code} > cc [~mengxr] [~dorx] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module
[ https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207511#comment-14207511 ] Apache Spark commented on SPARK-4348: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/3216 > pyspark.mllib.random conflicts with random module > - > > Key: SPARK-4348 > URL: https://issues.apache.org/jira/browse/SPARK-4348 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 1.1.0, 1.2.0 >Reporter: Davies Liu >Priority: Blocker > > There are conflict in two cases: > 1. random module is used by pyspark.mllib.feature, if the first part of > sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the > conflict. > 2. Run tests in mllib/xxx.py, the '' should be popped out before import > anything, or it will fail. > The first one is not fully fixed for user, it will introduce problems in some > cases, such as: > {code} > >>> import sys > >>> import sys.insert(0, PATH_OF_MODULE) > >>> import pyspark > >>> # use Word2Vec will fail > {code} > I'd like to rename mllib/random.py as random/_random.py, then in > mllib/__init.py > {code} > import pyspark.mllib._random as random > {code} > cc [~mengxr] [~dorx] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module
[ https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207199#comment-14207199 ] Doris Xin commented on SPARK-4348: -- I fully support this. It took a lot of hacking just to override the default random module in Python, and it wasn't clear if the override was the ideal solution. > pyspark.mllib.random conflicts with random module > - > > Key: SPARK-4348 > URL: https://issues.apache.org/jira/browse/SPARK-4348 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 1.1.0, 1.2.0 >Reporter: Davies Liu >Priority: Blocker > > There are conflict in two cases: > 1. random module is used by pyspark.mllib.feature, if the first part of > sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the > conflict. > 2. Run tests in mllib/xxx.py, the '' should be popped out before import > anything, or it will fail. > The first one is not fully fixed for user, it will introduce problems in some > cases, such as: > {code} > >>> import sys > >>> import sys.insert(0, PATH_OF_MODULE) > >>> import pyspark > >>> # use Word2Vec will fail > {code} > I'd like to rename mllib/random.py as random/_random.py, then in > mllib/__init.py > {code} > import pyspark.mllib._random as random > {code} > cc [~mengxr] [~dorx] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org