[ https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360534#comment-14360534 ]
Nicholas Chammas commented on SPARK-6282: ----------------------------------------- [~joshrosen], [~davies]: Does this error look familiar to you? > Strange Python import error when using random() in a lambda function > -------------------------------------------------------------------- > > Key: SPARK-6282 > URL: https://issues.apache.org/jira/browse/SPARK-6282 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.2.0 > Environment: Kubuntu 14.04, Python 2.7.6 > Reporter: Pavel Laskov > Priority: Minor > > Consider the exemplary Python code below: > from random import random > from pyspark.context import SparkContext > from xval_mllib import read_csv_file_as_list > if __name__ == "__main__": > sc = SparkContext(appName="Random() bug test") > data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv')) > #data = sc.parallelize([1, 2, 3, 4, 5], 2) > d = data.map(lambda x: (random(), x)) > print d.first() > Data is read from a large CSV file. Running this code results in a Python > import error: > ImportError: No module named _winreg > If I use 'import random' and 'random.random()' in the lambda function no > error occurs. Also no error occurs, for both kinds of import statements, for > a small artificial data set like the one shown in a commented line. > The full error trace, the source code of csv reading code (function > 'read_csv_file_as_list' is my own) as well as a sample dataset (the original > dataset is about 8M large) can be provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org