[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2015-01-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274333#comment-14274333
 ] 

Apache Spark commented on SPARK-4348:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4011

> pyspark.mllib.random conflicts with random module
> -
>
> Key: SPARK-4348
> URL: https://issues.apache.org/jira/browse/SPARK-4348
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.2.0
>
>
> There are conflict in two cases:
> 1. random module is used by pyspark.mllib.feature, if the first part of 
> sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
> conflict.
> 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
> anything, or it will fail.
> The first one is not fully fixed for user, it will introduce problems in some 
> cases, such as:
> {code}
> >>> import sys
> >>> import sys.insert(0, PATH_OF_MODULE)
> >>> import pyspark
> >>> # use Word2Vec will fail
> {code}
> I'd like to rename mllib/random.py as random/_random.py, then in 
> mllib/__init.py
> {code}
> import pyspark.mllib._random as random
> {code}
> cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-13 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211768#comment-14211768
 ] 

Xiangrui Meng commented on SPARK-4348:
--

Note that after this fix, it is very likely that the bytecode file `random.pyc` 
still sits under `pyspark/mllib`. We need to remove it manually to prevent 
"import random" taking that file. 

> pyspark.mllib.random conflicts with random module
> -
>
> Key: SPARK-4348
> URL: https://issues.apache.org/jira/browse/SPARK-4348
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.2.0
>
>
> There are conflict in two cases:
> 1. random module is used by pyspark.mllib.feature, if the first part of 
> sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
> conflict.
> 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
> anything, or it will fail.
> The first one is not fully fixed for user, it will introduce problems in some 
> cases, such as:
> {code}
> >>> import sys
> >>> import sys.insert(0, PATH_OF_MODULE)
> >>> import pyspark
> >>> # use Word2Vec will fail
> {code}
> I'd like to rename mllib/random.py as random/_random.py, then in 
> mllib/__init.py
> {code}
> import pyspark.mllib._random as random
> {code}
> cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-11 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207529#comment-14207529
 ] 

Davies Liu commented on SPARK-4348:
---

After some experiments, I found it's more harder than expected, it still need 
some hack to make it work (see the PR), but I think this hack is safer than 
before:

1. the rand.py module will not overwrite default random module, so it's safe to 
run the mllib/xxx.py without hacking, also we do not need hack to use random in 
mllib package.

2. the RandomModuleHook only installed when user try to import 'pyspark.mllib', 
it also only works for 'pyspark.mllib.random'.

Note: In order to use default random module, we need 'from __future__ import 
absolute_import' in the caller module, this also need as more. Without this, 
'import random' can be translated as 'from pyspark.mllib import random'.  So, 
there is a bug in master (Word2Vec)

> pyspark.mllib.random conflicts with random module
> -
>
> Key: SPARK-4348
> URL: https://issues.apache.org/jira/browse/SPARK-4348
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Davies Liu
>Priority: Blocker
>
> There are conflict in two cases:
> 1. random module is used by pyspark.mllib.feature, if the first part of 
> sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
> conflict.
> 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
> anything, or it will fail.
> The first one is not fully fixed for user, it will introduce problems in some 
> cases, such as:
> {code}
> >>> import sys
> >>> import sys.insert(0, PATH_OF_MODULE)
> >>> import pyspark
> >>> # use Word2Vec will fail
> {code}
> I'd like to rename mllib/random.py as random/_random.py, then in 
> mllib/__init.py
> {code}
> import pyspark.mllib._random as random
> {code}
> cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207511#comment-14207511
 ] 

Apache Spark commented on SPARK-4348:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/3216

> pyspark.mllib.random conflicts with random module
> -
>
> Key: SPARK-4348
> URL: https://issues.apache.org/jira/browse/SPARK-4348
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Davies Liu
>Priority: Blocker
>
> There are conflict in two cases:
> 1. random module is used by pyspark.mllib.feature, if the first part of 
> sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
> conflict.
> 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
> anything, or it will fail.
> The first one is not fully fixed for user, it will introduce problems in some 
> cases, such as:
> {code}
> >>> import sys
> >>> import sys.insert(0, PATH_OF_MODULE)
> >>> import pyspark
> >>> # use Word2Vec will fail
> {code}
> I'd like to rename mllib/random.py as random/_random.py, then in 
> mllib/__init.py
> {code}
> import pyspark.mllib._random as random
> {code}
> cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-11 Thread Doris Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207199#comment-14207199
 ] 

Doris Xin commented on SPARK-4348:
--

I fully support this. It took a lot of hacking just to override the default 
random module in Python, and it wasn't clear if the override was the ideal 
solution.

> pyspark.mllib.random conflicts with random module
> -
>
> Key: SPARK-4348
> URL: https://issues.apache.org/jira/browse/SPARK-4348
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Davies Liu
>Priority: Blocker
>
> There are conflict in two cases:
> 1. random module is used by pyspark.mllib.feature, if the first part of 
> sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
> conflict.
> 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
> anything, or it will fail.
> The first one is not fully fixed for user, it will introduce problems in some 
> cases, such as:
> {code}
> >>> import sys
> >>> import sys.insert(0, PATH_OF_MODULE)
> >>> import pyspark
> >>> # use Word2Vec will fail
> {code}
> I'd like to rename mllib/random.py as random/_random.py, then in 
> mllib/__init.py
> {code}
> import pyspark.mllib._random as random
> {code}
> cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org