[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-30 Thread Dmytro Bielievtsov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386564#comment-14386564
 ] 

Dmytro Bielievtsov commented on SPARK-6282:
---

I had the same strange issue when computing rdd's via blaze interface 
(ContinuumIO). As strangely as it appeared, It strangely disappeared after I 
removed 'six*' files from '/usr/lib/python-2.7/dist-packages' and then 
reinstalled 'six' with easy_install-2.7

> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-16 Thread Pavel Laskov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363340#comment-14363340
 ] 

Pavel Laskov commented on SPARK-6282:
-

Hi Davies,

Yes, I was also quite baffled that everything works on a small artificial 
dataset. Here is an example that fails on my machine while still being 
independent of real data I am using as well as any data-specific code on my 
part:

from numpy.random import random
# import random
from pyspark.context import SparkContext
from pyspark.mllib.rand import RandomRDDs
### Any of these imports causes the crash
from pyspark.mllib.tree import RandomForest, DecisionTreeModel
### from pyspark.mllib.linalg import SparseVector
### from pyspark.mllib.regression import LabeledPoint

if __name__ == "__main__":
 
sc = SparkContext(appName="Random() bug test")

data = RandomRDDs.normalVectorRDD(sc,numRows=1,numCols=200)
d = data.map(lambda x: (random(), x))
print d.first()

What breaks this code is the import of some mllib packages *even if they are 
not used* in the code (you can try any imports from the ### section). Another 
baffling thing is that nothing happens until some collection operation, like 
'collect', 'top' or 'first'. Comment out the print statement and the error 
disappears. 

Best regards from Munich, 

---
Pavel Laskov
Principal Engineer, Security Product Innovation Team
T: + 49 (0)89 158834-4170
E: pavel.las...@huawei.com
European Research Center, HUAWEI TECHNOLOGIES Duesseldorf GmbH 
Riessstr. 25 C-3.0G, 80992 Munich



> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-13 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361030#comment-14361030
 ] 

Davies Liu commented on SPARK-6282:
---

[~laskov] The following code runs fine here (master on Mac OS):

{code}
from random import random
from pyspark.context import SparkContext
from pyspark.mllib.random import RandomRDDs
from pyspark.mllib.tree import RandomForest, DecisionTreeModel
from pyspark.mllib.linalg import SparseVector
from pyspark.mllib.regression import LabeledPoint

if __name__ == "__main__":
sc = SparkContext(appName="Random() bug test")
data = sc.parallelize([1, 2, 3, 4, 5], 2)
data = RandomRDDs.normalVectorRDD(sc,numRows=1,numCols=200)
d = data.map(lambda x: (random(), x))
print d.first()
{code}

Could you tell exactly how to reproduce this problem?


> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360534#comment-14360534
 ] 

Nicholas Chammas commented on SPARK-6282:
-

[~joshrosen], [~davies]: Does this error look familiar to you?

> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-13 Thread Pavel Laskov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360105#comment-14360105
 ] 

Pavel Laskov commented on SPARK-6282:
-

Hi Nicholas,

I am using the local driver to run code on my laptop (Kubuntu 14.04).

The same error occurs if I switch the import to 

>From numpy.random import random

Best regards from Munich, 

---
Pavel Laskov
Principal Engineer, Security Product Innovation Team
T: + 49 (0)89 158834-4170
E: pavel.las...@huawei.com
European Research Center, HUAWEI TECHNOLOGIES Duesseldorf GmbH 
Riessstr. 25 C-3.0G, 80992 Munich




> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-12 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359404#comment-14359404
 ] 

Nicholas Chammas commented on SPARK-6282:
-

Shouldn't be related to boto. "_winreg" appears to be something Python uses to 
access the Windows registry, which is strange.

Please give us more details about your cluster setup, where you are running the 
driver from, etc. Also, what if you try using numpy's implementation of 
{{random}}?

> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359362#comment-14359362
 ] 

Sean Owen commented on SPARK-6282:
--

[~nchammas] or [~shivaram] might have a clue if it distantly relates to boto.

> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-12 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359336#comment-14359336
 ] 

Joseph K. Bradley commented on SPARK-6282:
--

It looks like "winreg" is referenced in Spark's dependencies (specifically, 
"boto" which is used for ec2).  I'm not very familiar with that part, and it's 
strange to me that it's ML-specific.  If others here aren't sure, I'd try 
asking on the user list.

> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-12 Thread Pavel Laskov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358504#comment-14358504
 ] 

Pavel Laskov commented on SPARK-6282:
-

Hi Sven and Joseph,

Thanks for a quick reply to my bug report. I still think the problem is 
somewhere in Spark. Here is an autonomous code snippet which triggers the error 
on my system. Uncommenting any of the imports marked with ### causes a crash. 
Switching to "import random / random.random()" fixes the problems. None of the 
functions imported in the ### lines is used in the test code. Looks like a very 
obscure dependency of some mllib components on _winreg? 


from random import random
# import random
from pyspark.context import SparkContext
from pyspark.mllib.rand import RandomRDDs
### Any of these imports causes the crash
### from pyspark.mllib.tree import RandomForest, DecisionTreeModel
### from pyspark.mllib.linalg import SparseVector
### from pyspark.mllib.regression import LabeledPoint

if __name__ == "__main__":
 
sc = SparkContext(appName="Random() bug test")
data = RandomRDDs.normalVectorRDD(sc,numRows=1,numCols=200)
d = data.map(lambda x: (random(), x))
print d.first()


Here is the full trace of the error:

Traceback (most recent call last):
  File "/home/laskov/research/pe-class/python/src/experiments/test_random.py", 
line 16, in 
print d.first()
  File "/home/laskov/code/spark-1.2.1/python/pyspark/rdd.py", line 1139, in 
first
rs = self.take(1)
  File "/home/laskov/code/spark-1.2.1/python/pyspark/rdd.py", line 1091, in take
totalParts = self._jrdd.partitions().size()
  File "/home/laskov/code/spark-1.2.1/python/pyspark/rdd.py", line 2115, in 
_jrdd
pickled_command = ser.dumps(command)
  File "/home/laskov/code/spark-1.2.1/python/pyspark/serializers.py", line 406, 
in dumps
return cloudpickle.dumps(obj, 2)
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 816, 
in dumps
cp.dump(obj)
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 133, 
in dump
return pickle.Pickler.dump(self, obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 562, in save_tuple
save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 254, 
in save_function
self.save_function_tuple(obj, [themodule])
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 304, 
in save_function_tuple
save((code, closure, base_globals))
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
  File "/usr/lib/python2.7/pickle.py", line 633, in _batch_appends
save(x)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 254, 
in save_function
self.save_function_tuple(obj, [themodule])
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 304, 
in save_function_tuple
save((code, closure, base_globals))
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
  File "/usr/lib/python2.7/pickle.py", line 636, in _batch_appends
save(tmp[0])
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 249, 
in save_function
self.save_function_tuple(obj, modList)
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 309, 
in save_function_tuple
save(f_globals)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/home/laskov/code/spark-1.2.1/python/pyspark/cloudpickle.py", line 174, 
in save_dict
pickle.Pickler.save_dict(self, obj)
  File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
  File "/usr/lib/pyth

[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-11 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357382#comment-14357382
 ] 

Sean Owen commented on SPARK-6282:
--

http://stackoverflow.com/questions/11133506/importerror-while-importing-winreg-module-of-python

It sounds like something you are calling invokes a Windows-only Python library 
called winreg, but you're executing on Linux.
This doesn't sound Spark-related, as certainly Spark does not invoke this.

> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6282) Strange Python import error when using random() in a lambda function

2015-03-11 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357364#comment-14357364
 ] 

Joseph K. Bradley commented on SPARK-6282:
--

Do you know where "_winreg" appears in the code you're running?  Is it being 
brought in by the read_csv_file_as_list method or its containing package?

> Strange Python import error when using random() in a lambda function
> 
>
> Key: SPARK-6282
> URL: https://issues.apache.org/jira/browse/SPARK-6282
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Kubuntu 14.04, Python 2.7.6
>Reporter: Pavel Laskov
>Priority: Minor
>
> Consider the exemplary Python code below:
>from random import random
>from pyspark.context import SparkContext
>from xval_mllib import read_csv_file_as_list
> if __name__ == "__main__": 
> sc = SparkContext(appName="Random() bug test")
> data = sc.parallelize(read_csv_file_as_list('data/malfease-xp.csv'))
> #data = sc.parallelize([1, 2, 3, 4, 5], 2)
> d = data.map(lambda x: (random(), x))
> print d.first()
> Data is read from a large CSV file. Running this code results in a Python 
> import error:
> ImportError: No module named _winreg
> If I use 'import random' and 'random.random()' in the lambda function no 
> error occurs. Also no error occurs, for both kinds of import statements, for 
> a small artificial data set like the one shown in a commented line.  
> The full error trace, the source code of csv reading code (function 
> 'read_csv_file_as_list' is my own) as well as a sample dataset (the original 
> dataset is about 8M large) can be provided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org