[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386564#comment-14386564 ] Dmytro Bielievtsov commented on SPARK-6282: --- I had the same strange issue when computing rdd's via blaze interface (ContinuumIO). As strangely as it appeared, It strangely disappeared after I removed 'six*' files from '/usr/lib/python-2.7/dist-packages' and then reinstalled 'six' with easy_install-2.7 > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363340#comment-14363340 ] Pavel Laskov commented on SPARK-6282: - Hi Davies, Yes, I was also quite baffled that everything works on a small artificial dataset. Here is an example that fails on my machine while still being independent of real data I am using as well as any data-specific code on my part: from numpy.random import random # import random from pyspark.context import SparkContext from pyspark.mllib.rand import RandomRDDs ### Any of these imports causes the crash from pyspark.mllib.tree import RandomForest, DecisionTreeModel ### from pyspark.mllib.linalg import SparseVector ### from pyspark.mllib.regression import LabeledPoint if __name__ == "__main__": sc = SparkContext(appName="Random() bug test") data = RandomRDDs.normalVectorRDD(sc,numRows=1,numCols=200) d = data.map(lambda x: (random(), x)) print d.first() What breaks this code is the import of some mllib packages *even if they are not used* in the code (you can try any imports from the ### section). Another baffling thing is that nothing happens until some collection operation, like 'collect', 'top' or 'first'. Comment out the print statement and the error disappears. Best regards from Munich, --- Pavel Laskov Principal Engineer, Security Product Innovation Team T: + 49 (0)89 158834-4170 E: pavel.las...@huawei.com European Research Center, HUAWEI TECHNOLOGIES Duesseldorf GmbH Riessstr. 25 C-3.0G, 80992 Munich > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361030#comment-14361030 ] Davies Liu commented on SPARK-6282: --- [~laskov] The following code runs fine here (master on Mac OS): {code} from random import random from pyspark.context import SparkContext from pyspark.mllib.random import RandomRDDs from pyspark.mllib.tree import RandomForest, DecisionTreeModel from pyspark.mllib.linalg import SparseVector from pyspark.mllib.regression import LabeledPoint if __name__ == "__main__": sc = SparkContext(appName="Random() bug test") data = sc.parallelize([1, 2, 3, 4, 5], 2) data = RandomRDDs.normalVectorRDD(sc,numRows=1,numCols=200) d = data.map(lambda x: (random(), x)) print d.first() {code} Could you tell exactly how to reproduce this problem? > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360534#comment-14360534 ] Nicholas Chammas commented on SPARK-6282: - [~joshrosen], [~davies]: Does this error look familiar to you? > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360105#comment-14360105 ] Pavel Laskov commented on SPARK-6282: - Hi Nicholas, I am using the local driver to run code on my laptop (Kubuntu 14.04). The same error occurs if I switch the import to >From numpy.random import random Best regards from Munich, --- Pavel Laskov Principal Engineer, Security Product Innovation Team T: + 49 (0)89 158834-4170 E: pavel.las...@huawei.com European Research Center, HUAWEI TECHNOLOGIES Duesseldorf GmbH Riessstr. 25 C-3.0G, 80992 Munich > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359404#comment-14359404 ] Nicholas Chammas commented on SPARK-6282: - Shouldn't be related to boto. "_winreg" appears to be something Python uses to access the Windows registry, which is strange. Please give us more details about your cluster setup, where you are running the driver from, etc. Also, what if you try using numpy's implementation of {{random}}? > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359362#comment-14359362 ] Sean Owen commented on SPARK-6282: -- [~nchammas] or [~shivaram] might have a clue if it distantly relates to boto. > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359336#comment-14359336 ] Joseph K. Bradley commented on SPARK-6282: -- It looks like "winreg" is referenced in Spark's dependencies (specifically, "boto" which is used for ec2). I'm not very familiar with that part, and it's strange to me that it's ML-specific. If others here aren't sure, I'd try asking on the user list. > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358504#comment-14358504 ] Pavel Laskov commented on SPARK-6282: - Hi Sven and Joseph, Thanks for a quick reply to my bug report. I still think the problem is somewhere in Spark. Here is an autonomous code snippet which triggers the error on my system. Uncommenting any of the imports marked with ### causes a crash. Switching to "import random / random.random()" fixes the problems. None of the functions imported in the ### lines is used in the test code. Looks like a very obscure dependency of some mllib components on _winreg? from random import random # import random from pyspark.context import SparkContext from pyspark.mllib.rand import RandomRDDs ### Any of these imports causes the crash ### from pyspark.mllib.tree import RandomForest, DecisionTreeModel ### from pyspark.mllib.linalg import SparseVector ### from pyspark.mllib.regression import LabeledPoint if __name__ == "__main__": sc = SparkContext(appName="Random() bug test") data = RandomRDDs.normalVectorRDD(sc,numRows=1,numCols=200) d = data.map(lambda x: (random(), x)) print d.first() Here is the full trace of the error: Traceback (most recent call last): File "/home/laskov/research/pe-class/python/src/experiments/test_random.py", line 16, in print d.first() File "/home/laskov/code/spark-1.2.1/python/pyspark/rdd.py", line 1139, in first rs = self.take(1) File "/home/laskov/code/spark-1.2.1/python/pyspark/rdd.py", line 1091, in take totalParts = self._jrdd.partitions().size() File "/home/laskov/code/spark-1.2.1/python/pyspark/rdd.py", line 2115, in _jrdd pickled_command = ser.dumps(command) File "/home/laskov/code/spark-1.2.1/python/pyspark/serializers.py", line 406, in dumps return cloudpickle.dumps(obj, 2) File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 816, in dumps cp.dump(obj) File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 133, in dump return pickle.Pickler.dump(self, obj) File "/usr/lib/python2.7/pickle.py", line 224, in dump self.save(obj) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 562, in save_tuple save(element) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 254, in save_function self.save_function_tuple(obj, [themodule]) File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 304, in save_function_tuple save((code, closure, base_globals)) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple save(element) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 600, in save_list self._batch_appends(iter(obj)) File "/usr/lib/python2.7/pickle.py", line 633, in _batch_appends save(x) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 254, in save_function self.save_function_tuple(obj, [themodule]) File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 304, in save_function_tuple save((code, closure, base_globals)) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple save(element) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 600, in save_list self._batch_appends(iter(obj)) File "/usr/lib/python2.7/pickle.py", line 636, in _batch_appends save(tmp[0]) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 249, in save_function self.save_function_tuple(obj, modList) File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 309, in save_function_tuple save(f_globals) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 174, in save_dict pickle.Pickler.save_dict(self, obj) File "/usr/lib/python2.7/pickle.py", line 649, in save_dict self._batch_setitems(obj.iteritems()) File "/usr/lib/pyth
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357382#comment-14357382 ] Sean Owen commented on SPARK-6282: -- http://stackoverflow.com/questions/11133506/importerror-while-importing-winreg-module-of-python It sounds like something you are calling invokes a Windows-only Python library called winreg, but you're executing on Linux. This doesn't sound Spark-related, as certainly Spark does not invoke this. > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function
[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357364#comment-14357364 ] Joseph K. Bradley commented on SPARK-6282: -- Do you know where "_winreg" appears in the code you're running? Is it being brought in by the read_csv_file_as_list method or its containing package? > Strange Python import error when using random() in a lambda function > > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 >Reporter: Pavel Laskov >Priority: Minor > > Consider the exemplary Python code below: >from random import random >from pyspark.context import SparkContext >from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org