[jira] [Commented] (SPARK-3869) ./bin/spark-class miss Java version with _JAVA_OPTIONS set

2014-10-13 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169329#comment-14169329
 ] 

cocoatomo commented on SPARK-3869:
--

Hi [~pwendell], thank you for informing me. Is it OK to use the abbreviated 
last name (e.g. "Barack O.") ?

> ./bin/spark-class miss Java version with _JAVA_OPTIONS set
> --
>
> Key: SPARK-3869
> URL: https://issues.apache.org/jira/browse/SPARK-3869
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
>Reporter: cocoatomo
>
> When _JAVA_OPTIONS environment variable is set, a command "java -version" 
> outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
> ./bin/spark-class knows java version from the first line of "java -version" 
> output, so it mistakes java version with _JAVA_OPTIONS set.
> commit: a85f24accd3266e0f97ee04d03c22b593d99c062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3910) ./python/pyspark/mllib/classification.py doctests fails with module name pollution

2014-10-12 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168786#comment-14168786
 ] 

cocoatomo commented on SPARK-3910:
--

Thank you for the comment.

I am trying it at $SPARK_HOME. (Executing "./bin/run-tests" command shows this.)
In addition, it is strange that a command
{noformat}
./bin/pyspark python/pyspark/mllib/classification.py
{noformat}
fails with "numpy ImportError".
So, my environment have some trouble (sys.path is suspicious) and at least we 
have some difference between environments where PySpark runs.

I set up my environment using virtualenvwrapper with Python 2.6.8 (default 
python executable on Mac OS X 10.9.5).
ImportError mentioned in this issue occurred on this environment.
For comparison, I tried testing on other environment which Python version is 
2.7.8, then got a same error.

Is there some difference between our environments?

> ./python/pyspark/mllib/classification.py doctests fails with module name 
> pollution
> --
>
> Key: SPARK-3910
> URL: https://issues.apache.org/jira/browse/SPARK-3910
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20, 
> Jinja2==2.7.3, MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3, 
> argparse==1.2.1, docutils==0.12, flake8==2.2.3, mccabe==0.2.1, numpy==1.9.0, 
> pep8==1.5.7, psutil==2.1.3, pyflake8==0.1.9, pyflakes==0.8.1, 
> unittest2==0.5.1, wsgiref==0.1.2
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> In ./python/run-tests script, we run the doctests in 
> ./pyspark/mllib/classification.py.
> The output is as following:
> {noformat}
> $ ./python/run-tests
> ...
> Running test: pyspark/mllib/classification.py
> Traceback (most recent call last):
>   File "pyspark/mllib/classification.py", line 20, in 
> import numpy
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/__init__.py",
>  line 170, in 
> from . import add_newdocs
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/add_newdocs.py",
>  line 13, in 
> from numpy.lib import add_newdoc
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/__init__.py",
>  line 8, in 
> from .type_check import *
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/type_check.py",
>  line 11, in 
> import numpy.core.numeric as _nx
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/core/__init__.py",
>  line 46, in 
> from numpy.testing import Tester
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/__init__.py",
>  line 13, in 
> from .utils import *
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/utils.py",
>  line 15, in 
> from tempfile import mkdtemp
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tempfile.py",
>  line 34, in 
> from random import Random as _Random
>   File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/mllib/random.py", 
> line 24, in 
> from pyspark.rdd import RDD
>   File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/__init__.py", line 
> 51, in 
> from pyspark.context import SparkContext
>   File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/context.py", line 
> 22, in 
> from tempfile import NamedTemporaryFile
> ImportError: cannot import name NamedTemporaryFile
> 0.07 real 0.04 user 0.02 sys
> Had test failures; see logs.
> {noformat}
> The problem is a cyclic import of tempfile module.
> The cause of it is that pyspark.mllib.random module exists in the directory 
> where pyspark.mllib.classification module exists.
> classification module imports numpy module, and then numpy module imports 
> tempfile module from its inside.
> Now the first entry sys.path is the directory "./python/pyspark/mllib" (where 
> the executed file "classification.py" exists), so tempfile module imports 
> pyspark.mllib.random module (not the standard library "random" module).
> Finally, import chains reach tempfile again, then a cyclic import is formed.
> Summary: classification → numpy → tempfile → pyspark.mllib.random → tempfile 
> → (cyclic import!!)
> Furthermore, stat module is in a standard library, and pyspark.mllib.stat 
> module exists. This also may be troublesome.
> commit: 0e8203f4fb721158fb27897680da476174d24c4b
> A fundamental solution is to avoid using module names used by standard 
> libraries (currently "random" and "stat").
> A difficulty of this solution is to rename pyspark.mllib.random and 

[jira] [Updated] (SPARK-3910) ./python/pyspark/mllib/classification.py doctests fails with module name pollution

2014-10-11 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3910:
-
Labels: pyspark testing  (was: )

> ./python/pyspark/mllib/classification.py doctests fails with module name 
> pollution
> --
>
> Key: SPARK-3910
> URL: https://issues.apache.org/jira/browse/SPARK-3910
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20, 
> Jinja2==2.7.3, MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3, 
> argparse==1.2.1, docutils==0.12, flake8==2.2.3, mccabe==0.2.1, numpy==1.9.0, 
> pep8==1.5.7, psutil==2.1.3, pyflake8==0.1.9, pyflakes==0.8.1, 
> unittest2==0.5.1, wsgiref==0.1.2
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> In ./python/run-tests script, we run the doctests in 
> ./pyspark/mllib/classification.py.
> The output is as following:
> {noformat}
> $ ./python/run-tests
> ...
> Running test: pyspark/mllib/classification.py
> Traceback (most recent call last):
>   File "pyspark/mllib/classification.py", line 20, in 
> import numpy
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/__init__.py",
>  line 170, in 
> from . import add_newdocs
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/add_newdocs.py",
>  line 13, in 
> from numpy.lib import add_newdoc
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/__init__.py",
>  line 8, in 
> from .type_check import *
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/type_check.py",
>  line 11, in 
> import numpy.core.numeric as _nx
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/core/__init__.py",
>  line 46, in 
> from numpy.testing import Tester
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/__init__.py",
>  line 13, in 
> from .utils import *
>   File 
> "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/utils.py",
>  line 15, in 
> from tempfile import mkdtemp
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tempfile.py",
>  line 34, in 
> from random import Random as _Random
>   File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/mllib/random.py", 
> line 24, in 
> from pyspark.rdd import RDD
>   File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/__init__.py", line 
> 51, in 
> from pyspark.context import SparkContext
>   File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/context.py", line 
> 22, in 
> from tempfile import NamedTemporaryFile
> ImportError: cannot import name NamedTemporaryFile
> 0.07 real 0.04 user 0.02 sys
> Had test failures; see logs.
> {noformat}
> The problem is a cyclic import of tempfile module.
> The cause of it is that pyspark.mllib.random module exists in the directory 
> where pyspark.mllib.classification module exists.
> classification module imports numpy module, and then numpy module imports 
> tempfile module from its inside.
> Now the first entry sys.path is the directory "./python/pyspark/mllib" (where 
> the executed file "classification.py" exists), so tempfile module imports 
> pyspark.mllib.random module (not the standard library "random" module).
> Finally, import chains reach tempfile again, then a cyclic import is formed.
> Summary: classification → numpy → tempfile → pyspark.mllib.random → tempfile 
> → (cyclic import!!)
> Furthermore, stat module is in a standard library, and pyspark.mllib.stat 
> module exists. This also may be troublesome.
> commit: 0e8203f4fb721158fb27897680da476174d24c4b
> A fundamental solution is to avoid using module names used by standard 
> libraries (currently "random" and "stat").
> A difficulty of this solution is to rename pyspark.mllib.random and 
> pyspark.mllib.stat, which may be already used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3910) ./python/pyspark/mllib/classification.py doctests fails with module name pollution

2014-10-11 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3910:


 Summary: ./python/pyspark/mllib/classification.py doctests fails 
with module name pollution
 Key: SPARK-3910
 URL: https://issues.apache.org/jira/browse/SPARK-3910
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20, 
Jinja2==2.7.3, MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3, argparse==1.2.1, 
docutils==0.12, flake8==2.2.3, mccabe==0.2.1, numpy==1.9.0, pep8==1.5.7, 
psutil==2.1.3, pyflake8==0.1.9, pyflakes==0.8.1, unittest2==0.5.1, 
wsgiref==0.1.2
Reporter: cocoatomo


In ./python/run-tests script, we run the doctests in 
./pyspark/mllib/classification.py.
The output is as following:

{noformat}
$ ./python/run-tests
...
Running test: pyspark/mllib/classification.py
Traceback (most recent call last):
  File "pyspark/mllib/classification.py", line 20, in 
import numpy
  File 
"/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/__init__.py",
 line 170, in 
from . import add_newdocs
  File 
"/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/add_newdocs.py",
 line 13, in 
from numpy.lib import add_newdoc
  File 
"/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/__init__.py",
 line 8, in 
from .type_check import *
  File 
"/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/type_check.py",
 line 11, in 
import numpy.core.numeric as _nx
  File 
"/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/core/__init__.py",
 line 46, in 
from numpy.testing import Tester
  File 
"/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/__init__.py",
 line 13, in 
from .utils import *
  File 
"/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/utils.py",
 line 15, in 
from tempfile import mkdtemp
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tempfile.py",
 line 34, in 
from random import Random as _Random
  File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/mllib/random.py", 
line 24, in 
from pyspark.rdd import RDD
  File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/__init__.py", line 
51, in 
from pyspark.context import SparkContext
  File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/context.py", line 
22, in 
from tempfile import NamedTemporaryFile
ImportError: cannot import name NamedTemporaryFile
0.07 real 0.04 user 0.02 sys
Had test failures; see logs.
{noformat}

The problem is a cyclic import of tempfile module.
The cause of it is that pyspark.mllib.random module exists in the directory 
where pyspark.mllib.classification module exists.
classification module imports numpy module, and then numpy module imports 
tempfile module from its inside.
Now the first entry sys.path is the directory "./python/pyspark/mllib" (where 
the executed file "classification.py" exists), so tempfile module imports 
pyspark.mllib.random module (not the standard library "random" module).
Finally, import chains reach tempfile again, then a cyclic import is formed.

Summary: classification → numpy → tempfile → pyspark.mllib.random → tempfile → 
(cyclic import!!)

Furthermore, stat module is in a standard library, and pyspark.mllib.stat 
module exists. This also may be troublesome.

commit: 0e8203f4fb721158fb27897680da476174d24c4b

A fundamental solution is to avoid using module names used by standard 
libraries (currently "random" and "stat").
A difficulty of this solution is to rename pyspark.mllib.random and 
pyspark.mllib.stat, which may be already used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3909) A corrupted format in Sphinx documents and building warnings

2014-10-11 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3909:


 Summary: A corrupted format in Sphinx documents and building 
warnings
 Key: SPARK-3909
 URL: https://issues.apache.org/jira/browse/SPARK-3909
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.7.8, Jinja2==2.7.3, 
MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3, docutils==0.12, numpy==1.9.0, 
wsgiref==0.1.2
Reporter: cocoatomo
Priority: Minor


Sphinx documents contains a corrupted ReST format and have some warnings.

The purpose of this issue is same as 
https://issues.apache.org/jira/browse/SPARK-3773.

commit: 0e8203f4fb721158fb27897680da476174d24c4b

output
{noformat}
$ cd ./python/docs
$ make clean html
rm -rf _build/*
sphinx-build -b html -d _build/doctrees   . _build/html
Making output directory...
Running Sphinx v1.2.3
loading pickled environment... not yet created
building [html]: targets for 4 source files that are out of date
updating environment: 4 added, 0 changed, 0 removed
reading sources... [100%] pyspark.sql   

  
/Users//MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of 
pyspark.mllib.feature.Word2VecModel.findSynonyms:4: WARNING: Field list ends 
without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of 
pyspark.mllib.feature.Word2VecModel.transform:3: WARNING: Field list ends 
without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/sql.py:docstring of 
pyspark.sql:4: WARNING: Bullet list ends without a blank line; unexpected 
unindent.
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] pyspark.sql

  
writing additional files... (12 module code pages) _modules/index search
copying static files... WARNING: html_static_path entry 
u'/Users//MyRepos/Scala/spark/python/docs/_static' does not exist
done
copying extra files... done
dumping search index... done
dumping object inventory... done
build succeeded, 4 warnings.

Build finished. The HTML pages are in _build/html.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3867) ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed

2014-10-10 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3867:
-
Description: 
./python/run-tests search a Python 2.6 executable on PATH and use it if 
available.
When using Python 2.6, it is going to import unittest2 module which is *not* a 
standard library in Python 2.6, so it fails with ImportError.

commit: 1d72a30874a88bdbab75217f001cf2af409016e7

  was:
./python/run-tests search a Python 2.6 executable on PATH and use it if 
available.
When using Python 2.6, it is going to import unittest2 module which is *not* a 
standard library in Python 2.6, so it fails with ImportError.


> ./python/run-tests failed when it run with Python 2.6 and unittest2 is not 
> installed
> 
>
> Key: SPARK-3867
> URL: https://issues.apache.org/jira/browse/SPARK-3867
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> ./python/run-tests search a Python 2.6 executable on PATH and use it if 
> available.
> When using Python 2.6, it is going to import unittest2 module which is *not* 
> a standard library in Python 2.6, so it fails with ImportError.
> commit: 1d72a30874a88bdbab75217f001cf2af409016e7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3869) ./bin/spark-class miss Java version with _JAVA_OPTIONS set

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3869:
-
Description: 
When _JAVA_OPTIONS environment variable is set, a command "java -version" 
outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
./bin/spark-class knows java version from the first line of "java -version" 
output, so it mistakes java version with _JAVA_OPTIONS set.

commit: a85f24accd3266e0f97ee04d03c22b593d99c062

  was:
When _JAVA_OPTIONS environment variable is set, a command "java -version" 
outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
./bin/spark-class knows java version from the first line of "java -version" 
output, so it mistakes java version with _JAVA_OPTIONS set.


> ./bin/spark-class miss Java version with _JAVA_OPTIONS set
> --
>
> Key: SPARK-3869
> URL: https://issues.apache.org/jira/browse/SPARK-3869
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
>Reporter: cocoatomo
>
> When _JAVA_OPTIONS environment variable is set, a command "java -version" 
> outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
> ./bin/spark-class knows java version from the first line of "java -version" 
> output, so it mistakes java version with _JAVA_OPTIONS set.
> commit: a85f24accd3266e0f97ee04d03c22b593d99c062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3869) ./bin/spark-class miss Java version with _JAVA_OPTIONS set

2014-10-08 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3869:


 Summary: ./bin/spark-class miss Java version with _JAVA_OPTIONS set
 Key: SPARK-3869
 URL: https://issues.apache.org/jira/browse/SPARK-3869
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
Reporter: cocoatomo


When _JAVA_OPTIONS environment variable is set, a command "java -version" 
outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
./bin/spark-class knows java version from the first line of "java -version" 
output, so it mistakes java version with _JAVA_OPTIONS set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3867) ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3867:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-3866

> ./python/run-tests failed when it run with Python 2.6 and unittest2 is not 
> installed
> 
>
> Key: SPARK-3867
> URL: https://issues.apache.org/jira/browse/SPARK-3867
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> ./python/run-tests search a Python 2.6 executable on PATH and use it if 
> available.
> When using Python 2.6, it is going to import unittest2 module which is *not* 
> a standard library in Python 2.6, so it fails with ImportError.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3868) Hard to recognize which module is tested from unit-tests.log

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3868:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-3866

> Hard to recognize which module is tested from unit-tests.log
> 
>
> Key: SPARK-3868
> URL: https://issues.apache.org/jira/browse/SPARK-3868
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> ./python/run-tests script display messages about which test it is running 
> currently on stdout but not write them on unit-tests.log.
> It is harder for us to recognize what test programs were executed and which 
> test was failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3868) Hard to recognize which module is tested from unit-tests.log

2014-10-08 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3868:


 Summary: Hard to recognize which module is tested from 
unit-tests.log
 Key: SPARK-3868
 URL: https://issues.apache.org/jira/browse/SPARK-3868
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
Reporter: cocoatomo


./python/run-tests script display messages about which test it is running 
currently on stdout but not write them on unit-tests.log.
It is harder for us to recognize what test programs were executed and which 
test was failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3867) ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed

2014-10-08 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3867:


 Summary: ./python/run-tests failed when it run with Python 2.6 and 
unittest2 is not installed
 Key: SPARK-3867
 URL: https://issues.apache.org/jira/browse/SPARK-3867
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
Reporter: cocoatomo


./python/run-tests search a Python 2.6 executable on PATH and use it if 
available.
When using Python 2.6, it is going to import unittest2 module which is *not* a 
standard library in Python 2.6, so it fails with ImportError.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Attachment: unit-tests.log

An output from ./python/run-tests

> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, Java 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
> Attachments: unit-tests.log
>
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> A test output is contained in the attached file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Attachment: (was: unit-tests.log)

> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, Java 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> A test output is contained in the attached file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Environment: Mac OS X 10.9.5, Python 2.7.8, Java 1.8.0_20  (was: Mac OS X 
10.9.5, Python 2.7.8, IPython 2.2.0, Java 1.8.0_20)

> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, Java 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
> Attachments: unit-tests.log
>
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> A test output is contained in the attached file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Attachment: unit-tests.log

An output from ./python/run-tests

> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0, Java 
> 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
> Attachments: unit-tests.log
>
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> A test output is contained in the attached file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Description: 
This issue is a overhaul issue to remove problems encountered when I run 
./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
It will have sub-tasks for some kinds of issues.

A test output is contained in the attached file.

  was:
This issue is a overhaul issue to remove problems encountered when I run 
./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
It will have sub-tasks for some kinds of issues.

Contents of unit-tests.log:
{noformat}

{noformat}


> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0, Java 
> 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> A test output is contained in the attached file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Description: 
This issue is a overhaul issue to remove problems encountered when I run 
./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
It will have sub-tasks for some kinds of issues.

Contents of unit-tests.log:
{noformat}

{noformat}

  was:
This issue is a overhaul issue to remove problems encountered when I run 
./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
It will have sub-tasks for some kinds of issues.

Contents of unit-tests.log:
{noformat}
{noformat}


> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0, Java 
> 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> Contents of unit-tests.log:
> {noformat}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Description: 
This issue is a overhaul issue to remove problems encountered when I run 
./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
It will have sub-tasks for some kinds of issues.

Contents of unit-tests.log:
{noformat}
{noformat}

  was:
This issue is a overhaul issue to remove problems encountered when I run 
./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
It will have sub-tasks for some kinds of issues.


> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> Contents of unit-tests.log:
> {noformat}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3866:
-
Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0, Java 1.8.0_20  
(was: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0)

> Clean up python/run-tests problems
> --
>
> Key: SPARK-3866
> URL: https://issues.apache.org/jira/browse/SPARK-3866
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0, Java 
> 1.8.0_20
>Reporter: cocoatomo
>  Labels: pyspark, testing
>
> This issue is a overhaul issue to remove problems encountered when I run 
> ./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
> It will have sub-tasks for some kinds of issues.
> Contents of unit-tests.log:
> {noformat}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3866) Clean up python/run-tests problems

2014-10-08 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3866:


 Summary: Clean up python/run-tests problems
 Key: SPARK-3866
 URL: https://issues.apache.org/jira/browse/SPARK-3866
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
Reporter: cocoatomo


This issue is a overhaul issue to remove problems encountered when I run 
./python/run-tests at commit a85f24accd3266e0f97ee04d03c22b593d99c062.
It will have sub-tasks for some kinds of issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3794) Building spark core fails with specific hadoop version

2014-10-04 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159384#comment-14159384
 ] 

cocoatomo commented on SPARK-3794:
--

Thank you for the comment.

Building from the root directory results in a same error.

{noformat}
$ mvn -Dhadoop.version=1.1.0 -DskipTests clean compile
...
[ERROR] 
/Users//MyRepos/Scala/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:720:
 value listFilesAndDirs is not a member of object 
org.apache.commons.io.FileUtils
[ERROR]   val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, 
TrueFileFilter.TRUE)
[ERROR] ^
[ERROR] one error found
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ... SUCCESS [  2.147 s]
[INFO] Spark Project Core . FAILURE [ 42.550 s]
[INFO] Spark Project Bagel  SKIPPED
[INFO] Spark Project GraphX ... SKIPPED
[INFO] Spark Project Streaming  SKIPPED
[INFO] Spark Project ML Library ... SKIPPED
[INFO] Spark Project Tools  SKIPPED
[INFO] Spark Project Catalyst . SKIPPED
[INFO] Spark Project SQL .. SKIPPED
[INFO] Spark Project Hive . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project Assembly . SKIPPED
[INFO] Spark Project External Twitter . SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 45.365 s
[INFO] Finished at: 2014-10-05T10:29:48+09:00
[INFO] Final Memory: 34M/1017M
[INFO] 
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on 
project spark-core_2.10: Execution scala-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> 
[Help 1]
{noformat}

> Building spark core fails with specific hadoop version
> --
>
> Key: SPARK-3794
> URL: https://issues.apache.org/jira/browse/SPARK-3794
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5
>Reporter: cocoatomo
>  Labels: spark
>
> At the commit cf1d32e3e1071829b152d4b597bf0a0d7a5629a2, building spark core 
> result in compilation error when we specify some hadoop versions.
> To reproduce this issue, we should execute following command with 
> =1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.2.1, or 2.2.0.
> {noformat}
> $ cd ./core
> $ mvn -Dhadoop.version= -DskipTests clean compile
> ...
> [ERROR] 
> /Users/tomohiko/MyRepos/Scala/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:720:
>  value listFilesAndDirs is not a member of object 
> org.apache.commons.io.FileUtils
> [ERROR]   val files = FileUtils.listFilesAndDirs(dir, 
> TrueFileFilter.TRUE, TrueFileFilter.TRUE)
> [ERROR] ^
> {noformat}
> Because that compilation uses commons-io version 2.1 and 
> FileUtils#listFilesAndDirs method was added at commons-io version 2.2, this 
> compilation always fails.
> FileUtils#listFilesAndDirs → 
> http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html#listFilesAndDirs%28java.io.File,%20org.apache.commons.io.filefilter.IOFileFilter,%20org.apache.commons.io.filefilter.IOFileFilter%29
> Because a hadoop-client in those problematic version depends on commons-io 
> 2.1 not 2.4, we should have assumption that commons-io is version 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3794) Building spark core fails with specific hadoop version

2014-10-04 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3794:
-
Description: 
At the commit cf1d32e3e1071829b152d4b597bf0a0d7a5629a2, building spark core 
result in compilation error when we specify some hadoop versions.

To reproduce this issue, we should execute following command with 
=1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.2.1, or 2.2.0.

{noformat}
$ cd ./core
$ mvn -Dhadoop.version= -DskipTests clean compile
...
[ERROR] 
/Users/tomohiko/MyRepos/Scala/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:720:
 value listFilesAndDirs is not a member of object 
org.apache.commons.io.FileUtils
[ERROR]   val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, 
TrueFileFilter.TRUE)
[ERROR] ^
{noformat}

Because that compilation uses commons-io version 2.1 and 
FileUtils#listFilesAndDirs method was added at commons-io version 2.2, this 
compilation always fails.

FileUtils#listFilesAndDirs → 
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html#listFilesAndDirs%28java.io.File,%20org.apache.commons.io.filefilter.IOFileFilter,%20org.apache.commons.io.filefilter.IOFileFilter%29

Because a hadoop-client in those problematic version depends on commons-io 2.1 
not 2.4, we should have assumption that commons-io is version 2.1.

  was:
At the commit cf1d32e3e1071829b152d4b597bf0a0d7a5629a2, building spark core 
result in compilation error when we specify some hadoop versions.

To reproduce this issue, we should execute following command with 
=1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.2.1, or 2.2.0.

{noformat}
$ cd ./core
$ mvn -Dhadoop.version= -DskipTests clean compile
...
[ERROR] 
/Users/tomohiko/MyRepos/Scala/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:720:
 value listFilesAndDirs is not a member of object 
org.apache.commons.io.FileUtils
[ERROR]   val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, 
TrueFileFilter.TRUE)
[ERROR] ^
{noformat}

Because that compilation uses commons-io version 2.1 and 
FileUtils#listFilesAndDirs method was added at commons-io version 2.2, this 
compilation already fails.

FileUtils#listFilesAndDirs → 
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html#listFilesAndDirs%28java.io.File,%20org.apache.commons.io.filefilter.IOFileFilter,%20org.apache.commons.io.filefilter.IOFileFilter%29

Because a hadoop-client in those problematic version depends on commons-io 2.1 
not 2.4, we should have assumption that commons-io is version 2.1.


> Building spark core fails with specific hadoop version
> --
>
> Key: SPARK-3794
> URL: https://issues.apache.org/jira/browse/SPARK-3794
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5
>Reporter: cocoatomo
>  Labels: spark
> Fix For: 1.2.0
>
>
> At the commit cf1d32e3e1071829b152d4b597bf0a0d7a5629a2, building spark core 
> result in compilation error when we specify some hadoop versions.
> To reproduce this issue, we should execute following command with 
> =1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.2.1, or 2.2.0.
> {noformat}
> $ cd ./core
> $ mvn -Dhadoop.version= -DskipTests clean compile
> ...
> [ERROR] 
> /Users/tomohiko/MyRepos/Scala/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:720:
>  value listFilesAndDirs is not a member of object 
> org.apache.commons.io.FileUtils
> [ERROR]   val files = FileUtils.listFilesAndDirs(dir, 
> TrueFileFilter.TRUE, TrueFileFilter.TRUE)
> [ERROR] ^
> {noformat}
> Because that compilation uses commons-io version 2.1 and 
> FileUtils#listFilesAndDirs method was added at commons-io version 2.2, this 
> compilation always fails.
> FileUtils#listFilesAndDirs → 
> http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html#listFilesAndDirs%28java.io.File,%20org.apache.commons.io.filefilter.IOFileFilter,%20org.apache.commons.io.filefilter.IOFileFilter%29
> Because a hadoop-client in those problematic version depends on commons-io 
> 2.1 not 2.4, we should have assumption that commons-io is version 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3794) Building spark core fails with specific hadoop version

2014-10-04 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3794:


 Summary: Building spark core fails with specific hadoop version
 Key: SPARK-3794
 URL: https://issues.apache.org/jira/browse/SPARK-3794
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5
Reporter: cocoatomo
 Fix For: 1.2.0


At the commit cf1d32e3e1071829b152d4b597bf0a0d7a5629a2, building spark core 
result in compilation error when we specify some hadoop versions.

To reproduce this issue, we should execute following command with 
=1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.2.1, or 2.2.0.

{noformat}
$ cd ./core
$ mvn -Dhadoop.version= -DskipTests clean compile
...
[ERROR] 
/Users/tomohiko/MyRepos/Scala/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:720:
 value listFilesAndDirs is not a member of object 
org.apache.commons.io.FileUtils
[ERROR]   val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, 
TrueFileFilter.TRUE)
[ERROR] ^
{noformat}

Because that compilation uses commons-io version 2.1 and 
FileUtils#listFilesAndDirs method was added at commons-io version 2.2, this 
compilation already fails.

FileUtils#listFilesAndDirs → 
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html#listFilesAndDirs%28java.io.File,%20org.apache.commons.io.filefilter.IOFileFilter,%20org.apache.commons.io.filefilter.IOFileFilter%29

Because a hadoop-client in those problematic version depends on commons-io 2.1 
not 2.4, we should have assumption that commons-io is version 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3773) Sphinx build warnings

2014-10-02 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157590#comment-14157590
 ] 

cocoatomo commented on SPARK-3773:
--

Using Sphinx to generate API docs for PySpark

> Sphinx build warnings
> -
>
> Key: SPARK-3773
> URL: https://issues.apache.org/jira/browse/SPARK-3773
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0, 
> Jinja2==2.7.3, MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3, 
> docutils==0.12, numpy==1.9.0
>Reporter: cocoatomo
>Priority: Minor
>  Labels: docs, docstrings, pyspark
>
> When building Sphinx documents for PySpark, we have 12 warnings.
> Their causes are almost docstrings in broken ReST format.
> To reproduce this issue, we should run following commands on the commit: 
> 6e27cb630de69fa5acb510b4e2f6b980742b1957.
> {quote}
> $ cd ./python/docs
> $ make clean html
> ...
> /Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
> pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation.
> /Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
> pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation.
> /Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
>  of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: 
> Unexpected indentation.
> /Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
>  of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: 
> Definition list ends without a blank line; unexpected unindent.
> /Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
>  of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
>  of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected 
> indentation.
> /Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
>  of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition 
> list ends without a blank line; unexpected unindent.
> /Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
>  of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote 
> ends without a blank line; unexpected unindent.
> /Users//MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: 
> missing attribute mentioned in :members: or __all__: module 
> pyspark.mllib.regression, attribute 
> RidgeRegressionModelLinearRegressionWithSGD
> /Users//MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of 
> pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation.
> ...
> checking consistency... 
> /Users//MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document 
> isn't included in any toctree
> ...
> copying static files... WARNING: html_static_path entry 
> u'/Users//MyRepos/Scala/spark/python/docs/_static' does not exist
> ...
> build succeeded, 12 warnings.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3773) Sphinx build warnings

2014-10-02 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3773:
-
Description: 
When building Sphinx documents for PySpark, we have 12 warnings.
Their causes are almost docstrings in broken ReST format.

To reproduce this issue, we should run following commands on the commit: 
6e27cb630de69fa5acb510b4e2f6b980742b1957.

{quote}
$ cd ./python/docs
$ make clean html
...
/Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: 
Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: 
Definition list ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: 
Block quote ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected 
indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition list 
ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote ends 
without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: 
missing attribute mentioned in :members: or __all__: module 
pyspark.mllib.regression, attribute RidgeRegressionModelLinearRegressionWithSGD
/Users//MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of 
pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation.
...
checking consistency... 
/Users//MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document 
isn't included in any toctree
...
copying static files... WARNING: html_static_path entry 
u'/Users//MyRepos/Scala/spark/python/docs/_static' does not exist
...
build succeeded, 12 warnings.
{quote}


  was:
When building Sphinx documents for PySpark, we have 12 warnings.
Their causes are almost docstrings in broken ReST format.

To reproduce this issue, we should run following commands.

{quote}
$ cd ./python/docs
$ make clean html
...
/Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: 
Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: 
Definition list ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: 
Block quote ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected 
indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition list 
ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote ends 
without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: 
missing attribute mentioned in :members: or __all__: module 
pyspark.mllib.regression, attribute RidgeRegressionModelLinearRegressionWithSGD
/Users//MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of 
pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation.
...
checking consistency... 
/Users//MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document 
isn't included in any toctree
...
copying static files... WARNING: html_static_path entry 
u'/Users//MyRepos/Scala/spark/python/docs/_static' does not exist
...
build succeeded, 12 warnings.
{quote}



> Sphinx build warning

[jira] [Commented] (SPARK-3772) RDD operation on IPython REPL failed with an illegal port number

2014-10-02 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157586#comment-14157586
 ] 

cocoatomo commented on SPARK-3772:
--

Thank you for the advice.

I added the commit hash on the description.

> RDD operation on IPython REPL failed with an illegal port number
> 
>
> Key: SPARK-3772
> URL: https://issues.apache.org/jira/browse/SPARK-3772
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
>Reporter: cocoatomo
>  Labels: pyspark
>
> To reproduce this issue, we should execute following commands on the commit: 
> 6e27cb630de69fa5acb510b4e2f6b980742b1957.
> {quote}
> $ PYSPARK_PYTHON=ipython ./bin/pyspark
> ...
> In [1]: file = sc.textFile('README.md')
> In [2]: file.first()
> ...
> 14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded
> 14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1
> 14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:334
> 14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at 
> PythonRDD.scala:334) with 1 output partitions (allowLocal=true)
> 14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at 
> PythonRDD.scala:334)
> 14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List()
> 14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List()
> 14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD 
> at PythonRDD.scala:44), which has no missing parents
> 14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with 
> curMem=57388, maxMem=278019440
> 14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 4.4 KB, free 265.1 MB)
> 14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 
> (PythonRDD[2] at RDD at PythonRDD.scala:44)
> 14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
> 14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, PROCESS_LOCAL, 1207 bytes)
> 14/10/03 08:50:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 14/10/03 08:50:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.lang.IllegalArgumentException: port out of range:1027423549
>   at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
>   at java.net.InetSocketAddress.(InetSocketAddress.java:188)
>   at java.net.Socket.(Socket.java:244)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
>   at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:100)
>   at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:71)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:744)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3772) RDD operation on IPython REPL failed with an illegal port number

2014-10-02 Thread cocoatomo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cocoatomo updated SPARK-3772:
-
Description: 
To reproduce this issue, we should execute following commands on the commit: 
6e27cb630de69fa5acb510b4e2f6b980742b1957.

{quote}
$ PYSPARK_PYTHON=ipython ./bin/pyspark
...
In [1]: file = sc.textFile('README.md')
In [2]: file.first()
...
14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded
14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1
14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at PythonRDD.scala:334
14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:334) 
with 1 output partitions (allowLocal=true)
14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at 
PythonRDD.scala:334)
14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List()
14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List()
14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD at 
PythonRDD.scala:44), which has no missing parents
14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with 
curMem=57388, maxMem=278019440
14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 4.4 KB, free 265.1 MB)
14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 
(PythonRDD[2] at RDD at PythonRDD.scala:44)
14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, PROCESS_LOCAL, 1207 bytes)
14/10/03 08:50:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
14/10/03 08:50:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: port out of range:1027423549
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.(InetSocketAddress.java:188)
at java.net.Socket.(Socket.java:244)
at 
org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
at 
org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:100)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
{quote}

  was:
To reproduce this issue, we should execute following commands.

{quote}
$ PYSPARK_PYTHON=ipython ./bin/pyspark
...
In [1]: file = sc.textFile('README.md')
In [2]: file.first()
...
14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded
14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1
14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at PythonRDD.scala:334
14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:334) 
with 1 output partitions (allowLocal=true)
14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at 
PythonRDD.scala:334)
14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List()
14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List()
14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD at 
PythonRDD.scala:44), which has no missing parents
14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with 
curMem=57388, maxMem=278019440
14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 4.4 KB, free 265.1 MB)
14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 
(PythonRDD[2] at RDD at PythonRDD.scala:44)
14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, PROCESS_LOCAL, 1207 bytes)
14/10/03 08:50:13 INFO Executor: Running task 0.0

[jira] [Created] (SPARK-3773) Sphinx build warnings

2014-10-02 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3773:


 Summary: Sphinx build warnings
 Key: SPARK-3773
 URL: https://issues.apache.org/jira/browse/SPARK-3773
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0, 
Jinja2==2.7.3, MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3, docutils==0.12, 
numpy==1.9.0
Reporter: cocoatomo
Priority: Minor


When building Sphinx documents for PySpark, we have 12 warnings.
Their causes are almost docstrings in broken ReST format.

To reproduce this issue, we should run following commands.

{quote}
$ cd ./python/docs
$ make clean html
...
/Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of 
pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: 
Unexpected indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: 
Definition list ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: 
Block quote ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected 
indentation.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition list 
ends without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring
 of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote ends 
without a blank line; unexpected unindent.
/Users//MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: 
missing attribute mentioned in :members: or __all__: module 
pyspark.mllib.regression, attribute RidgeRegressionModelLinearRegressionWithSGD
/Users//MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of 
pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation.
...
checking consistency... 
/Users//MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document 
isn't included in any toctree
...
copying static files... WARNING: html_static_path entry 
u'/Users//MyRepos/Scala/spark/python/docs/_static' does not exist
...
build succeeded, 12 warnings.
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3772) RDD operation on IPython REPL failed with an illegal port number

2014-10-02 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3772:


 Summary: RDD operation on IPython REPL failed with an illegal port 
number
 Key: SPARK-3772
 URL: https://issues.apache.org/jira/browse/SPARK-3772
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
Reporter: cocoatomo


To reproduce this issue, we should execute following commands.

{quote}
$ PYSPARK_PYTHON=ipython ./bin/pyspark
...
In [1]: file = sc.textFile('README.md')
In [2]: file.first()
...
14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded
14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1
14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at PythonRDD.scala:334
14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:334) 
with 1 output partitions (allowLocal=true)
14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at 
PythonRDD.scala:334)
14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List()
14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List()
14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD at 
PythonRDD.scala:44), which has no missing parents
14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with 
curMem=57388, maxMem=278019440
14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 4.4 KB, free 265.1 MB)
14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 
(PythonRDD[2] at RDD at PythonRDD.scala:44)
14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, PROCESS_LOCAL, 1207 bytes)
14/10/03 08:50:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
14/10/03 08:50:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: port out of range:1027423549
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.(InetSocketAddress.java:188)
at java.net.Socket.(Socket.java:244)
at 
org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
at 
org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:100)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3706) Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset

2014-10-02 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156776#comment-14156776
 ] 

cocoatomo commented on SPARK-3706:
--

Thank you for the comment and modification, [~joshrosen].

Taking a quick look, this regression created at the commit 
[f38fab97c7970168f1bd81d4dc202e36322c95e3|https://github.com/apache/spark/commit/f38fab97c7970168f1bd81d4dc202e36322c95e3#diff-5dbcb82caf8131d60c73e82cf8d12d8aR107]
 on master branch.
Pushing "ipython" aside into a default value force us to set PYSPARK_PYTHON as 
"ipython", since PYSPARK_PYTHON defaults to "python" at the top of the 
./bin/pyspark script.
This issue is a regression between 1.1.0 and 1.2.0, therefore affects only 
1.2.0.

> Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
> 
>
> Key: SPARK-3706
> URL: https://issues.apache.org/jira/browse/SPARK-3706
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0
> Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
>Reporter: cocoatomo
>  Labels: pyspark
>
> h3. Problem
> The section "Using the shell" in Spark Programming Guide 
> (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) 
> says that we can run pyspark REPL through IPython.
> But a folloing command does not run IPython but a default Python executable.
> {quote}
> $ IPYTHON=1 ./bin/pyspark
> Python 2.7.8 (default, Jul  2 2014, 10:14:46) 
> ...
> {quote}
> the spark/bin/pyspark script on the commit 
> b235e013638685758885842dc3268e9800af3678 decides which executable and options 
> it use folloing way.
> # if PYSPARK_PYTHON unset
> #* → defaulting to "python"
> # if IPYTHON_OPTS set
> #* → set IPYTHON "1"
> # some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
> #* out of this issues scope
> # if IPYTHON set as "1"
> #* → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
> #* otherwise execute $PYSPARK_PYTHON
> Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is 
> "1".
> In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no 
> effect on decide which command to use.
> ||PYSPARK_PYTHON||IPYTHON_OPTS||IPYTHON||resulting command||expected command||
> |(unset → defaults to python)|(unset)|(unset)|python|(same)|
> |(unset → defaults to python)|(unset)|1|python|ipython|
> |(unset → defaults to python)|an_option|(unset → set to 1)|python 
> an_option|ipython an_option|
> |(unset → defaults to python)|an_option|1|python an_option|ipython an_option|
> |ipython|(unset)|(unset)|ipython|(same)|
> |ipython|(unset)|1|ipython|(same)|
> |ipython|an_option|(unset → set to 1)|ipython an_option|(same)|
> |ipython|an_option|1|ipython an_option|(same)|
> h3. Suggestion
> The pyspark script should determine firstly whether a user wants to run 
> IPython or other executables.
> # if IPYTHON_OPTS set
> #* set IPYTHON "1"
> # if IPYTHON has a value "1"
> #* PYSPARK_PYTHON defaults to "ipython" if not set
> # PYSPARK_PYTHON defaults to "python" if not set
> See the pull request for more detailed modification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3420) Using Sphinx to generate API docs for PySpark

2014-09-28 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151073#comment-14151073
 ] 

cocoatomo commented on SPARK-3420:
--

Thank you for the comment.

Yes, I am interested in this work.

As a question for confirmation, the words "improve the docs" means "to remove 
ReST error"?
Or, a work mentioned by the words includes to make documents more fulfilled and 
useful?

> Using Sphinx to generate API docs for PySpark
> -
>
> Key: SPARK-3420
> URL: https://issues.apache.org/jira/browse/SPARK-3420
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Sphinx can generate better documents than epydoc, so let's move on to Sphinx.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3420) Using Sphinx to generate API docs for PySpark

2014-09-27 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150592#comment-14150592
 ] 

cocoatomo commented on SPARK-3420:
--

Do you mean to generate API documents using sphinx-apidoc?

When I try to build documents using sphinx-apidoc, I have some import errors 
and ReST format errors.
I prefer Sphinx to Epydoc, so I want to fix that errors.

> Using Sphinx to generate API docs for PySpark
> -
>
> Key: SPARK-3420
> URL: https://issues.apache.org/jira/browse/SPARK-3420
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Sphinx can generate better documents than epydoc, so let's move on to Sphinx.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3706) Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset

2014-09-26 Thread cocoatomo (JIRA)
cocoatomo created SPARK-3706:


 Summary: Cannot run IPython REPL with IPYTHON set to "1" and 
PYSPARK_PYTHON unset
 Key: SPARK-3706
 URL: https://issues.apache.org/jira/browse/SPARK-3706
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.1.0
 Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0
Reporter: cocoatomo


h3. Problem

The section "Using the shell" in Spark Programming Guide 
(https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) 
says that we can run pyspark REPL through IPython.
But a folloing command does not run IPython but a default Python executable.

{quote}
$ IPYTHON=1 ./bin/pyspark
Python 2.7.8 (default, Jul  2 2014, 10:14:46) 
...
{quote}

the spark/bin/pyspark script on the commit 
b235e013638685758885842dc3268e9800af3678 decides which executable and options 
it use folloing way.

# if PYSPARK_PYTHON unset
#* → defaulting to "python"
# if IPYTHON_OPTS set
#* → set IPYTHON "1"
# some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
#* out of this issues scope
# if IPYTHON set as "1"
#* → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
#* otherwise execute $PYSPARK_PYTHON

Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is 
"1".
In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no 
effect on decide which command to use.

||PYSPARK_PYTHON||IPYTHON_OPTS||IPYTHON||resulting command||expected command||
|(unset → defaults to python)|(unset)|(unset)|python|(same)|
|(unset → defaults to python)|(unset)|1|python|ipython|
|(unset → defaults to python)|an_option|(unset → set to 1)|python 
an_option|ipython an_option|
|(unset → defaults to python)|an_option|1|python an_option|ipython an_option|
|ipython|(unset)|(unset)|ipython|(same)|
|ipython|(unset)|1|ipython|(same)|
|ipython|an_option|(unset → set to 1)|ipython an_option|(same)|
|ipython|an_option|1|ipython an_option|(same)|


h3. Suggestion

The pyspark script should determine firstly whether a user wants to run IPython 
or other executables.

# if IPYTHON_OPTS set
#* set IPYTHON "1"
# if IPYTHON has a value "1"
#* PYSPARK_PYTHON defaults to "ipython" if not set
# PYSPARK_PYTHON defaults to "python" if not set

See the pull request for more detailed modification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org