date:20140810

[jira] [Created] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6

2014-08-10 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-2954:
-

 Summary: PySpark MLlib serialization tests fail on Python 2.6
 Key: SPARK-2954
 URL: https://issues.apache.org/jira/browse/SPARK-2954
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Josh Rosen


The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking 
data from bytearray using struct.unpack:

{code}
**
File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double
Failed example:
_deserialize_double(_serialize_double(1L)) == 1.0
Exception raised:
Traceback (most recent call last):
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
 line 1253, in __run
compileflags, 1) in test.globs
  File "", line 1, in 
_deserialize_double(_serialize_double(1L)) == 1.0
  File "pyspark/mllib/_common.py", line 194, in _deserialize_double
return struct.unpack("d", ba[offset:])[0]
error: unpack requires a string argument of length 8
**
File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double
Failed example:
_deserialize_double(_serialize_double(sys.float_info.max)) == x
Exception raised:
Traceback (most recent call last):
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
 line 1253, in __run
compileflags, 1) in test.globs
  File "", line 1, in 
_deserialize_double(_serialize_double(sys.float_info.max)) == x
  File "pyspark/mllib/_common.py", line 194, in _deserialize_double
return struct.unpack("d", ba[offset:])[0]
error: unpack requires a string argument of length 8
**
File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double
Failed example:
_deserialize_double(_serialize_double(sys.float_info.max)) == y
Exception raised:
Traceback (most recent call last):
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
 line 1253, in __run
compileflags, 1) in test.globs
  File "", line 1, in 
_deserialize_double(_serialize_double(sys.float_info.max)) == y
  File "pyspark/mllib/_common.py", line 194, in _deserialize_double
return struct.unpack("d", ba[offset:])[0]
error: unpack requires a string argument of length 8
**
{code}

It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: 
http://stackoverflow.com/a/15467046/590203



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6

2014-08-10 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-2954:
-

Assignee: Josh Rosen

> PySpark MLlib serialization tests fail on Python 2.6
> 
>
> Key: SPARK-2954
> URL: https://issues.apache.org/jira/browse/SPARK-2954
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The PySpark MLlib tests currently fail on Python 2.6 due to problems 
> unpacking data from bytearray using struct.unpack:
> {code}
> **
> File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(1L)) == 1.0
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(1L)) == 1.0
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> {code}
> It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: 
> http://stackoverflow.com/a/15467046/590203



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6

2014-08-10 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-2954:
--

Component/s: PySpark

> PySpark MLlib serialization tests fail on Python 2.6
> 
>
> Key: SPARK-2954
> URL: https://issues.apache.org/jira/browse/SPARK-2954
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
>Reporter: Josh Rosen
>
> The PySpark MLlib tests currently fail on Python 2.6 due to problems 
> unpacking data from bytearray using struct.unpack:
> {code}
> **
> File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(1L)) == 1.0
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(1L)) == 1.0
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> {code}
> It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: 
> http://stackoverflow.com/a/15467046/590203



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092044#comment-14092044
 ] 

Apache Spark commented on SPARK-2954:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1874

> PySpark MLlib serialization tests fail on Python 2.6
> 
>
> Key: SPARK-2954
> URL: https://issues.apache.org/jira/browse/SPARK-2954
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The PySpark MLlib tests currently fail on Python 2.6 due to problems 
> unpacking data from bytearray using struct.unpack:
> {code}
> **
> File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(1L)) == 1.0
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(1L)) == 1.0
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == x
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double
> Failed example:
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> _deserialize_double(_serialize_double(sys.float_info.max)) == y
>   File "pyspark/mllib/_common.py", line 194, in _deserialize_double
> return struct.unpack("d", ba[offset:])[0]
> error: unpack requires a string argument of length 8
> **
> {code}
> It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: 
> http://stackoverflow.com/a/15467046/590203



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2948) PySpark doesn't work on Python 2.6

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092045#comment-14092045
 ] 

Apache Spark commented on SPARK-2948:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1874

> PySpark doesn't work on Python 2.6
> --
>
> Key: SPARK-2948
> URL: https://issues.apache.org/jira/browse/SPARK-2948
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: CentOS 6.5 / Python 2.6.6
>Reporter: Kousuke Saruta
>Priority: Blocker
>
> In serializser.py, collections.namedtuple is redefined as follows.
> {code}
> def namedtuple(name, fields, verbose=False, rename=False):
>   
>   
> cls = _old_namedtuple(name, fields, verbose, rename)  
>   
>   
> return _hack_namedtuple(cls)  
>   
>   
>  
> {code}
> The number of arguments is 4 but the number of arguments of namedtuple for 
> Python 2.6 is 3 so mismatch is occurred.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-2948) PySpark doesn't work on Python 2.6

2014-08-10 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-2948:
-

Assignee: Josh Rosen

> PySpark doesn't work on Python 2.6
> --
>
> Key: SPARK-2948
> URL: https://issues.apache.org/jira/browse/SPARK-2948
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: CentOS 6.5 / Python 2.6.6
>Reporter: Kousuke Saruta
>Assignee: Josh Rosen
>Priority: Blocker
>
> In serializser.py, collections.namedtuple is redefined as follows.
> {code}
> def namedtuple(name, fields, verbose=False, rename=False):
>   
>   
> cls = _old_namedtuple(name, fields, verbose, rename)  
>   
>   
> return _hack_namedtuple(cls)  
>   
>   
>  
> {code}
> The number of arguments is 4 but the number of arguments of namedtuple for 
> Python 2.6 is 3 so mismatch is occurred.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2101) Python unit tests fail on Python 2.6 because of lack of unittest.skipIf()

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092048#comment-14092048
 ] 

Apache Spark commented on SPARK-2101:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1874

> Python unit tests fail on Python 2.6 because of lack of unittest.skipIf()
> -
>
> Key: SPARK-2101
> URL: https://issues.apache.org/jira/browse/SPARK-2101
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Uri Laserson
>Assignee: Josh Rosen
>
> PySpark tests fail with Python 2.6 because they currently depend on 
> {{unittest.skipIf}}, which was only introduced in Python 2.7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2910) Test with Python 2.6 on Jenkins

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092046#comment-14092046
 ] 

Apache Spark commented on SPARK-2910:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1874

> Test with Python 2.6 on Jenkins
> ---
>
> Key: SPARK-2910
> URL: https://issues.apache.org/jira/browse/SPARK-2910
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Reporter: Josh Rosen
>
> As long as we continue to support Python 2.6 in PySpark, Jenkins should test  
> with Python 2.6.
> We could downgrade the system Python to 2.6, but it might be easier / cleaner 
> to install 2.6 alongside the current Python and {{export 
> PYSPARK_PYTHON=python2.6}} in the test runner script.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-2910) Test with Python 2.6 on Jenkins

2014-08-10 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-2910:
-

Assignee: Josh Rosen

> Test with Python 2.6 on Jenkins
> ---
>
> Key: SPARK-2910
> URL: https://issues.apache.org/jira/browse/SPARK-2910
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> As long as we continue to support Python 2.6 in PySpark, Jenkins should test  
> with Python 2.6.
> We could downgrade the system Python to 2.6, but it might be easier / cleaner 
> to install 2.6 alongside the current Python and {{export 
> PYSPARK_PYTHON=python2.6}} in the test runner script.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2945) Allow specifying num of executors in the context configuration

2014-08-10 Thread Shay Rojansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092058#comment-14092058
 ] 

Shay Rojansky commented on SPARK-2945:
--

I just did a quick test on Spark 1.0.2, and spark.executor.instances does 
indeed appear to control the number of executors allocated (at least in YARN).

Should I keep this open for you guys to take a look and update the docs?

> Allow specifying num of executors in the context configuration
> --
>
> Key: SPARK-2945
> URL: https://issues.apache.org/jira/browse/SPARK-2945
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 1.0.0
> Environment: Ubuntu precise, on YARN (CDH 5.1.0)
>Reporter: Shay Rojansky
>
> Running on YARN, the only way to specify the number of executors seems to be 
> on the command line of spark-submit, via the --num-executors switch.
> In many cases this is too early. Our Spark app receives some cmdline 
> arguments which determine the amount of work that needs to be done - and that 
> affects the number of executors it ideally requires. Ideally, the Spark 
> context configuration would support specifying this like any other config 
> param.
> Our current workaround is a wrapper script that determines how much work is 
> needed, and which itself launches spark-submit with the number passed to 
> --num-executors - it's a shame to have to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2947) DAGScheduler resubmit the stage into an infinite loop

2014-08-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-2947:
---

Summary: DAGScheduler resubmit the stage into an infinite loop  (was: 
DAGScheduler resubmit the task into an infinite loop)

> DAGScheduler resubmit the stage into an infinite loop
> -
>
> Key: SPARK-2947
> URL: https://issues.apache.org/jira/browse/SPARK-2947
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.2
>Reporter: Guoqiang Li
>Priority: Blocker
> Fix For: 1.1.0, 1.0.3
>
>
> Stage to resubmit more than 5 times.
> This seems to be caused by {{FetchFailed.bmAddress}} is null .
> I don't know how to reproduce it.
> master log:
> {noformat}
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as 
> TID 52334 on executor 82: sanshan (PROCESS_LOCAL)
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 
> 3060 bytes in 0 ms
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as 
> TID 52335 on executor 78: tuan231 (PROCESS_LOCAL)
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 
> 3060 bytes in 0 ms
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 
> 1.189:141)
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch 
> failure from null
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch 
> failure from null
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
>  -- 5 times ---
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
> 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 
> 1.189, whose tasks have all completed, from pool 
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 
> ms on jilin (progress: 280/280)
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 
> 269)
> 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 
> 2.1, whose tasks have all completed, from pool 
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at 
> DealCF.scala:207) finished in 129.544 s
> {noformat}
> worker: log
> {noformat}
> /1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, 
> computing it
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, 
> computing it
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18017
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18151
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, 
> computing it
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, 
> computing it
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18285
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18419
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:

[jira] [Updated] (SPARK-2947) DAGScheduler resubmit the task into an infinite loop

2014-08-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-2947:
---

Summary: DAGScheduler resubmit the task into an infinite loop  (was: 
DAGScheduler scheduling infinite loop)

> DAGScheduler resubmit the task into an infinite loop
> 
>
> Key: SPARK-2947
> URL: https://issues.apache.org/jira/browse/SPARK-2947
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.2
>Reporter: Guoqiang Li
>Priority: Blocker
> Fix For: 1.1.0, 1.0.3
>
>
> Stage to resubmit more than 5 times.
> This seems to be caused by {{FetchFailed.bmAddress}} is null .
> I don't know how to reproduce it.
> master log:
> {noformat}
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as 
> TID 52334 on executor 82: sanshan (PROCESS_LOCAL)
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 
> 3060 bytes in 0 ms
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as 
> TID 52335 on executor 78: tuan231 (PROCESS_LOCAL)
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 
> 3060 bytes in 0 ms
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 
> 1.189:141)
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch 
> failure from null
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch 
> failure from null
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
>  -- 5 times ---
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
> 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 
> 1.189, whose tasks have all completed, from pool 
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 
> ms on jilin (progress: 280/280)
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 
> 269)
> 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 
> 2.1, whose tasks have all completed, from pool 
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at 
> DealCF.scala:207) finished in 129.544 s
> {noformat}
> worker: log
> {noformat}
> /1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, 
> computing it
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, 
> computing it
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18017
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18151
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, 
> computing it
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, 
> computing it
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18285
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18419
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage

[jira] [Commented] (SPARK-2947) DAGScheduler resubmit the stage into an infinite loop

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092100#comment-14092100
 ] 

Apache Spark commented on SPARK-2947:
-

User 'witgo' has created a pull request for this issue:
https://github.com/apache/spark/pull/1877

> DAGScheduler resubmit the stage into an infinite loop
> -
>
> Key: SPARK-2947
> URL: https://issues.apache.org/jira/browse/SPARK-2947
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.2
>Reporter: Guoqiang Li
>Priority: Blocker
> Fix For: 1.1.0, 1.0.3
>
>
> Stage to resubmit more than 5 times.
> This seems to be caused by {{FetchFailed.bmAddress}} is null .
> I don't know how to reproduce it.
> master log:
> {noformat}
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as 
> TID 52334 on executor 82: sanshan (PROCESS_LOCAL)
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 
> 3060 bytes in 0 ms
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as 
> TID 52335 on executor 78: tuan231 (PROCESS_LOCAL)
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 
> 3060 bytes in 0 ms
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 
> 1.189:141)
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch 
> failure from null
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
> 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch 
> failure from null
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
>  -- 5 times ---
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at 
> DealCF.scala:215) for resubmision due to a fetch failure
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from 
> Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission
> 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 
> 1.189, whose tasks have all completed, from pool 
> 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 
> ms on jilin (progress: 280/280)
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 
> 269)
> 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 
> 2.1, whose tasks have all completed, from pool 
> 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at 
> DealCF.scala:207) finished in 129.544 s
> {noformat}
> worker: log
> {noformat}
> /1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, 
> computing it
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, 
> computing it
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18017
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18151
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, 
> computing it
> 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, 
> computing it
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18285
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally
> 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 18419
> 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally
> 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadc

[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0

2014-08-10 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-1297:
--

Attachment: spark-1297-v2.txt

Tentative patch adds hbase-hadoop2 profile.

> Upgrade HBase dependency to 0.98.0
> --
>
> Key: SPARK-1297
> URL: https://issues.apache.org/jira/browse/SPARK-1297
> Project: Spark
>  Issue Type: Task
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-1297-v2.txt
>
>
> HBase 0.94.6 was released 11 months ago.
> Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0

2014-08-10 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-1297:
--

Attachment: (was: spark-1297-v2.txt)

> Upgrade HBase dependency to 0.98.0
> --
>
> Key: SPARK-1297
> URL: https://issues.apache.org/jira/browse/SPARK-1297
> Project: Spark
>  Issue Type: Task
>Reporter: Ted Yu
>Priority: Minor
>
> HBase 0.94.6 was released 11 months ago.
> Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0

2014-08-10 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-1297:
--

Attachment: spark-1297-v2.txt

> Upgrade HBase dependency to 0.98.0
> --
>
> Key: SPARK-1297
> URL: https://issues.apache.org/jira/browse/SPARK-1297
> Project: Spark
>  Issue Type: Task
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-1297-v2.txt
>
>
> HBase 0.94.6 was released 11 months ago.
> Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1297) Upgrade HBase dependency to 0.98.0

2014-08-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092115#comment-14092115
 ] 

Sean Owen commented on SPARK-1297:
--

This doesn't work with Hadoop 1 though. It also requires turning on an HBase 
profile for every build. See my comments above; I think this can be made 
friendlier with more work in the profiles. I think it requires a "hadoop1" 
profile to really solve this kind of problem for every components, not just 
HBase.

> Upgrade HBase dependency to 0.98.0
> --
>
> Key: SPARK-1297
> URL: https://issues.apache.org/jira/browse/SPARK-1297
> Project: Spark
>  Issue Type: Task
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-1297-v2.txt
>
>
> HBase 0.94.6 was released 11 months ago.
> Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2944) sc.makeRDD doesn't distribute partitions evenly

2014-08-10 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092135#comment-14092135
 ] 

Xiangrui Meng commented on SPARK-2944:
--

Found that this behavior is not deterministic. So it is hard to tell which 
commit introduces it now. It seems that it happens when tasks are very small. 
Some workers may get a lot more assignments than others because they finishes 
the tasks very quickly and TaskSetManager always picks the first available one. 
(There are no randomization in `TaskSetManager`.)

> sc.makeRDD doesn't distribute partitions evenly
> ---
>
> Key: SPARK-2944
> URL: https://issues.apache.org/jira/browse/SPARK-2944
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Critical
>
> 16 nodes EC2 cluster:
> {code}
> val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache()
> rdd.count()
> {code}
> Saw 156 partitions on one node while only 8 partitions on another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2850) Check API consistency for statistical functions

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092149#comment-14092149
 ] 

Apache Spark commented on SPARK-2850:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/1878

> Check API consistency for statistical functions
> ---
>
> Key: SPARK-2850
> URL: https://issues.apache.org/jira/browse/SPARK-2850
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Xiangrui Meng
>Assignee: Joseph K. Bradley
>
> Scala/Java/Python APIs for 
> 1. RDD.takeSample
> 2. RDD.sample
> 3. RDD.sampleByKey
> 4. correlations
> 5. random RDD generators



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2950) Add GC time and Shuffle Write time to JobLogger output

2014-08-10 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-2950:
-

Fix Version/s: 1.2.0

> Add GC time and Shuffle Write time to JobLogger output
> --
>
> Key: SPARK-2950
> URL: https://issues.apache.org/jira/browse/SPARK-2950
> Project: Spark
>  Issue Type: Improvement
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Minor
> Fix For: 1.2.0
>
>
> The JobLogger is very useful for performing offline performance profiling of 
> Spark jobs. GC Time and Shuffle Write time are available in TaskMetrics but 
> are currently missed from the JobLogger output. This change adds these two 
> fields.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2950) Add GC time and Shuffle Write time to JobLogger output

2014-08-10 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-2950.
--

Resolution: Fixed

> Add GC time and Shuffle Write time to JobLogger output
> --
>
> Key: SPARK-2950
> URL: https://issues.apache.org/jira/browse/SPARK-2950
> Project: Spark
>  Issue Type: Improvement
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Minor
> Fix For: 1.2.0
>
>
> The JobLogger is very useful for performing offline performance profiling of 
> Spark jobs. GC Time and Shuffle Write time are available in TaskMetrics but 
> are currently missed from the JobLogger output. This change adds these two 
> fields.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2898) Failed to connect to daemon

2014-08-10 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-2898.
---

   Resolution: Fixed
Fix Version/s: 1.1.0

> Failed to connect to daemon
> ---
>
> Key: SPARK-2898
> URL: https://issues.apache.org/jira/browse/SPARK-2898
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 1.1.0
>
>
> There is a deadlock  in handle_sigchld() because of logging
> 
> Java options: -Dspark.storage.memoryFraction=0.66 
> -Dspark.serializer=org.apache.spark.serializer.JavaSerializer 
> -Dspark.executor.memory=3g -Dspark.locality.wait=6000
> Options: SchedulerThroughputTest --num-tasks=1 --num-trials=4 
> --inter-trial-wait=1
> 
> 14/08/06 22:09:41 WARN JettyUtils: Failed to create UI on port 4040. Trying 
> again on port 4041. - Failure(java.net.BindException: Address already in use)
> worker 50114 crashed abruptly with exit status 1
> 14/08/06 22:10:37 ERROR Executor: Exception in task 1476.0 in stage 1.0 (TID 
> 11476)
> org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
>   at 
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:150)
>   at 
> org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:154)
>   at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>   at org.apache.spark.scheduler.Task.run(Task.scala:54)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:392)
>   at 
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:101)
>   ... 10 more
> 14/08/06 22:10:37 WARN PythonWorkerFactory: Failed to open socket to Python 
> daemon:
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at java.net.Socket.connect(Socket.java:528)
>   at java.net.Socket.(Socket.java:425)
>   at java.net.Socket.(Socket.java:241)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:68)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:83)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:82)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:55)
>   at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:101)
>   at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:66)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>   at org.apache.spark.scheduler.Task.run(Task.scala:54)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 14/08/06 22:10:37 ERROR Executor: Exception in task 1478.0 in stage 1.0 (TID 
> 11478)
> java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:392)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:69)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:83)
>   at 
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:82)
>   at 
> org.apache.spark.api.python.PythonWor

[jira] [Created] (SPARK-2955) Test code fails to compile with "mvn compile" without "install"

2014-08-10 Thread Sean Owen (JIRA)

Sean Owen created SPARK-2955:


 Summary: Test code fails to compile with "mvn compile" without 
"install" 
 Key: SPARK-2955
 URL: https://issues.apache.org/jira/browse/SPARK-2955
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.2
Reporter: Sean Owen
Priority: Minor


(This is the corrected follow-up to 
https://issues.apache.org/jira/browse/SPARK-2903 )

Right now, "mvn compile test-compile" fails to compile Spark. (Don't worry; 
"mvn package" works, so this is not major.) The issue stems from test code in 
some modules depending on test code in other modules. That is perfectly fine 
and supported by Maven.

It takes extra work to get this to work with scalatest, and this has been 
attempted: https://github.com/apache/spark/blob/master/sql/catalyst/pom.xml#L86

This formulation is not quite enough, since the SQL Core module's tests fail to 
compile for lack of finding test classes in SQL Catalyst, and likewise for most 
Streaming integration modules depending on core Streaming test code. Example:

{code}
[error] 
/Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:23:
 not found: type PlanTest
[error] class QueryTest extends PlanTest {
[error] ^
[error] 
/Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:28:
 package org.apache.spark.sql.test is not a value
[error]   test("SPARK-1669: cacheTable should be idempotent") {
[error]   ^
...
{code}

The issue I believe is that generation of a test-jar is bound here to the 
compile phase, but the test classes are not being compiled in this phase. It 
should bind to the test-compile phase.

It works when executing "mvn package" or "mvn install" since test-jar artifacts 
are actually generated available through normal Maven mechanisms as each module 
is built. They are then found normally, regardless of scalatest configuration.

It would be nice for a simple "mvn compile test-compile" to work since the test 
code is perfectly compilable given the Maven declarations.

On the plus side, this change is low-risk as it only affects tests.
[~yhuai] made the original scalatest change and has glanced at this and thinks 
it makes sense.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2955) Test code fails to compile with "mvn compile" without "install"

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092180#comment-14092180
 ] 

Apache Spark commented on SPARK-2955:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1879

> Test code fails to compile with "mvn compile" without "install" 
> 
>
> Key: SPARK-2955
> URL: https://issues.apache.org/jira/browse/SPARK-2955
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.2
>Reporter: Sean Owen
>Priority: Minor
>  Labels: build, compile, scalatest, test, test-compile
>
> (This is the corrected follow-up to 
> https://issues.apache.org/jira/browse/SPARK-2903 )
> Right now, "mvn compile test-compile" fails to compile Spark. (Don't worry; 
> "mvn package" works, so this is not major.) The issue stems from test code in 
> some modules depending on test code in other modules. That is perfectly fine 
> and supported by Maven.
> It takes extra work to get this to work with scalatest, and this has been 
> attempted: 
> https://github.com/apache/spark/blob/master/sql/catalyst/pom.xml#L86
> This formulation is not quite enough, since the SQL Core module's tests fail 
> to compile for lack of finding test classes in SQL Catalyst, and likewise for 
> most Streaming integration modules depending on core Streaming test code. 
> Example:
> {code}
> [error] 
> /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:23:
>  not found: type PlanTest
> [error] class QueryTest extends PlanTest {
> [error] ^
> [error] 
> /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:28:
>  package org.apache.spark.sql.test is not a value
> [error]   test("SPARK-1669: cacheTable should be idempotent") {
> [error]   ^
> ...
> {code}
> The issue I believe is that generation of a test-jar is bound here to the 
> compile phase, but the test classes are not being compiled in this phase. It 
> should bind to the test-compile phase.
> It works when executing "mvn package" or "mvn install" since test-jar 
> artifacts are actually generated available through normal Maven mechanisms as 
> each module is built. They are then found normally, regardless of scalatest 
> configuration.
> It would be nice for a simple "mvn compile test-compile" to work since the 
> test code is perfectly compilable given the Maven declarations.
> On the plus side, this change is low-risk as it only affects tests.
> [~yhuai] made the original scalatest change and has glanced at this and 
> thinks it makes sense.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2650) Caching tables larger than memory causes OOMs

2014-08-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2650:


Summary: Caching tables larger than memory causes OOMs  (was: Wrong initial 
sizes for in-memory column buffers)

> Caching tables larger than memory causes OOMs
> -
>
> Key: SPARK-2650
> URL: https://issues.apache.org/jira/browse/SPARK-2650
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
>
> The logic for setting up the initial column buffers is different for Spark 
> SQL compared to Shark and I'm seeing OOMs when caching tables that are larger 
> than available memory (where shark was okay).
> Two suspicious things: the intialSize is always set to 0 so we always go with 
> the default.  The default looks like it was copied from code like 10 * 1024 * 
> 1024... but in Spark SQL its 10 * 102 * 1024.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2650) Caching tables larger than memory causes OOMs

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092209#comment-14092209
 ] 

Apache Spark commented on SPARK-2650:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/1880

> Caching tables larger than memory causes OOMs
> -
>
> Key: SPARK-2650
> URL: https://issues.apache.org/jira/browse/SPARK-2650
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
>
> The logic for setting up the initial column buffers is different for Spark 
> SQL compared to Shark and I'm seeing OOMs when caching tables that are larger 
> than available memory (where shark was okay).
> Two suspicious things: the intialSize is always set to 0 so we always go with 
> the default.  The default looks like it was copied from code like 10 * 1024 * 
> 1024... but in Spark SQL its 10 * 102 * 1024.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2937) Separate out sampleByKeyExact in PairRDDFunctions as its own API

2014-08-10 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2937.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 1866
[https://github.com/apache/spark/pull/1866]

> Separate out sampleByKeyExact in PairRDDFunctions as its own API
> 
>
> Key: SPARK-2937
> URL: https://issues.apache.org/jira/browse/SPARK-2937
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Doris Xin
>Assignee: Doris Xin
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2956) Support transferring large blocks in Netty network module

2014-08-10 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2956:
---

Summary: Support transferring large blocks in Netty network module  (was: 
Support transferring blocks larger than MTU size)

> Support transferring large blocks in Netty network module
> -
>
> Key: SPARK-2956
> URL: https://issues.apache.org/jira/browse/SPARK-2956
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
>
> The existing Netty shuffle implementation does not support large blocks. 
> The culprit is in FileClientHandler.channelRead0().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2956) Support transferring blocks larger than MTU size

2014-08-10 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-2956:
--

 Summary: Support transferring blocks larger than MTU size
 Key: SPARK-2956
 URL: https://issues.apache.org/jira/browse/SPARK-2956
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical


The existing Netty shuffle implementation does not support large blocks. 

The culprit is in FileClientHandler.channelRead0().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2944) sc.makeRDD doesn't distribute partitions evenly

2014-08-10 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2944:
---

Priority: Blocker  (was: Critical)

> sc.makeRDD doesn't distribute partitions evenly
> ---
>
> Key: SPARK-2944
> URL: https://issues.apache.org/jira/browse/SPARK-2944
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Blocker
>
> 16 nodes EC2 cluster:
> {code}
> val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache()
> rdd.count()
> {code}
> Saw 156 partitions on one node while only 8 partitions on another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2957) Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo

2014-08-10 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092290#comment-14092290
 ] 

Reynold Xin commented on SPARK-2957:


cc [~tlipcon] [~t...@lipcon.org] will probably bug you when we work on this. 

> Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo
> --
>
> Key: SPARK-2957
> URL: https://issues.apache.org/jira/browse/SPARK-2957
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2468) Netty-based shuffle network module

2014-08-10 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2468:
---

Summary: Netty-based shuffle network module  (was: Netty based network 
communication)

> Netty-based shuffle network module
> --
>
> Key: SPARK-2468
> URL: https://issues.apache.org/jira/browse/SPARK-2468
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based 
> network module implemented (org.apache.spark.network.netty). However, it 
> lacks some functionality and is turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2468) Netty based network communication

2014-08-10 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2468:
---

Summary: Netty based network communication  (was: zero-copy shuffle network 
communication)

> Netty based network communication
> -
>
> Key: SPARK-2468
> URL: https://issues.apache.org/jira/browse/SPARK-2468
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based 
> network module implemented (org.apache.spark.network.netty). However, it 
> lacks some functionality and is turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2957) Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo

2014-08-10 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-2957:
--

 Summary: Leverage Hadoop native io's fadvise and read-ahead in 
Netty transferTo
 Key: SPARK-2957
 URL: https://issues.apache.org/jira/browse/SPARK-2957
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2956) Support transferring large blocks in Netty network module

2014-08-10 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2956:
---

Description: 
The existing Netty shuffle implementation does not support large blocks. 

The culprit is in FileClientHandler.channelRead0().

We should add a LengthFieldBasedFrameDecoder to the pipeline.

  was:
The existing Netty shuffle implementation does not support large blocks. 

The culprit is in FileClientHandler.channelRead0().


> Support transferring large blocks in Netty network module
> -
>
> Key: SPARK-2956
> URL: https://issues.apache.org/jira/browse/SPARK-2956
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
>
> The existing Netty shuffle implementation does not support large blocks. 
> The culprit is in FileClientHandler.channelRead0().
> We should add a LengthFieldBasedFrameDecoder to the pipeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2957) Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo

2014-08-10 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092294#comment-14092294
 ] 

Todd Lipcon commented on SPARK-2957:


Sure, happy to help

> Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo
> --
>
> Key: SPARK-2957
> URL: https://issues.apache.org/jira/browse/SPARK-2957
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2958) FileClientHandler should not be shared in the pipeline

2014-08-10 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-2958:
--

 Summary: FileClientHandler should not be shared in the pipeline
 Key: SPARK-2958
 URL: https://issues.apache.org/jira/browse/SPARK-2958
 Project: Spark
  Issue Type: Bug
Reporter: Reynold Xin


Netty module creates a single FileClientHandler and shares it in all threads. 
We should create a new one for each pipeline thread.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2959) Use a single FileClient and Netty client thread pool

2014-08-10 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-2959:
--

 Summary: Use a single FileClient and Netty client thread pool
 Key: SPARK-2959
 URL: https://issues.apache.org/jira/browse/SPARK-2959
 Project: Spark
  Issue Type: Improvement
Reporter: Reynold Xin


The current implementation creates a new Netty bootstrap for fetching each 
block. This is pretty crazy! 

We should reuse the bootstrap FileClient.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2959) Use a single FileClient and Netty client thread pool

2014-08-10 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2959:
---

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-2468

> Use a single FileClient and Netty client thread pool
> 
>
> Key: SPARK-2959
> URL: https://issues.apache.org/jira/browse/SPARK-2959
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Reynold Xin
>
> The current implementation creates a new Netty bootstrap for fetching each 
> block. This is pretty crazy! 
> We should reuse the bootstrap FileClient.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2931) getAllowedLocalityLevel() throws ArrayIndexOutOfBoundsException

2014-08-10 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092331#comment-14092331
 ] 

Rui Li commented on SPARK-2931:
---

I think this may be introduced in 
[#892|https://github.com/apache/spark/pull/892]. When the valid locality levels 
are recomputed I didn't reset the current level. Sorry about this.

> getAllowedLocalityLevel() throws ArrayIndexOutOfBoundsException
> ---
>
> Key: SPARK-2931
> URL: https://issues.apache.org/jira/browse/SPARK-2931
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
> Environment: Spark EC2, spark-1.1.0-snapshot1, sort-by-key spark-perf 
> benchmark
>Reporter: Josh Rosen
>Priority: Blocker
> Fix For: 1.1.0
>
> Attachments: scala-sort-by-key.err, test.patch
>
>
> When running Spark Perf's sort-by-key benchmark on EC2 with v1.1.0-snapshot, 
> I get the following errors (one per task):
> {code}
> 14/08/08 18:54:22 INFO scheduler.TaskSetManager: Starting task 39.0 in stage 
> 0.0 (TID 39, ip-172-31-14-30.us-west-2.compute.internal, PROCESS_LOCAL, 1003 
> bytes)
> 14/08/08 18:54:22 INFO cluster.SparkDeploySchedulerBackend: Registered 
> executor: 
> Actor[akka.tcp://sparkexecu...@ip-172-31-9-213.us-west-2.compute.internal:58901/user/Executor#1436065036]
>  with ID 0
> 14/08/08 18:54:22 ERROR actor.OneForOneStrategy: 1
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.spark.scheduler.TaskSetManager.getAllowedLocalityLevel(TaskSetManager.scala:475)
>   at 
> org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:409)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7$$anonfun$apply$2.apply$mcVI$sp(TaskSchedulerImpl.scala:261)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:257)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:254)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:254)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:254)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:254)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:153)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:103)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> This causes the job to hang.
> I can deterministically reproduce this by re-running the test, either in 
> isolation or as part of the full performance testing suite.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2677) BasicBlockFetchIterator#next can wait forever

2014-08-10 Thread Kousuke Saruta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092355#comment-14092355
 ] 

Kousuke Saruta commented on SPARK-2677:
---

SPARK-2538 was resolved but there is still this issue.
I tried to resolve this issue in https://github.com/apache/spark/pull/1632

> BasicBlockFetchIterator#next can wait forever
> -
>
> Key: SPARK-2677
> URL: https://issues.apache.org/jira/browse/SPARK-2677
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.2, 1.0.0, 1.0.1
>Reporter: Kousuke Saruta
>Assignee: Josh Rosen
>Priority: Blocker
>
> In BasicBlockFetchIterator#next, it waits fetch result on result.take.
> {code}
> override def next(): (BlockId, Option[Iterator[Any]]) = {
>   resultsGotten += 1
>   val startFetchWait = System.currentTimeMillis()
>   val result = results.take()
>   val stopFetchWait = System.currentTimeMillis()
>   _fetchWaitTime += (stopFetchWait - startFetchWait)
>   if (! result.failed) bytesInFlight -= result.size
>   while (!fetchRequests.isEmpty &&
> (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= 
> maxBytesInFlight)) {
> sendRequest(fetchRequests.dequeue())
>   }
>   (result.blockId, if (result.failed) None else 
> Some(result.deserialize()))
> }
> {code}
> But, results is implemented as LinkedBlockingQueue so if remote executor hang 
> up, fetching Executor waits forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2960) Spark executables fail to start via symlinks

2014-08-10 Thread Shay Rojansky (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Rojansky updated SPARK-2960:
-

Priority: Minor  (was: Major)

> Spark executables fail to start via symlinks
> 
>
> Key: SPARK-2960
> URL: https://issues.apache.org/jira/browse/SPARK-2960
> Project: Spark
>  Issue Type: Bug
>Reporter: Shay Rojansky
>Priority: Minor
> Fix For: 1.0.2
>
>
> The current scripts (e.g. pyspark) fail to run when they are executed via 
> symlinks. A common Linux scenario would be to have Spark installed somewhere 
> (e.g. /opt) and have a symlink to it in /usr/bin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2960) Spark executables fail to start via symlinks

2014-08-10 Thread Shay Rojansky (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Rojansky updated SPARK-2960:
-

Summary: Spark executables fail to start via symlinks  (was: Spark 
executables failed to start via symlinks)

> Spark executables fail to start via symlinks
> 
>
> Key: SPARK-2960
> URL: https://issues.apache.org/jira/browse/SPARK-2960
> Project: Spark
>  Issue Type: Bug
>Reporter: Shay Rojansky
> Fix For: 1.0.2
>
>
> The current scripts (e.g. pyspark) fail to run when they are executed via 
> symlinks. A common Linux scenario would be to have Spark installed somewhere 
> (e.g. /opt) and have a symlink to it in /usr/bin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2960) Spark executables failed to start via symlinks

2014-08-10 Thread Shay Rojansky (JIRA)

Shay Rojansky created SPARK-2960:


 Summary: Spark executables failed to start via symlinks
 Key: SPARK-2960
 URL: https://issues.apache.org/jira/browse/SPARK-2960
 Project: Spark
  Issue Type: Bug
Reporter: Shay Rojansky
 Fix For: 1.0.2


The current scripts (e.g. pyspark) fail to run when they are executed via 
symlinks. A common Linux scenario would be to have Spark installed somewhere 
(e.g. /opt) and have a symlink to it in /usr/bin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2878) Inconsistent Kryo serialisation with custom Kryo Registrator

2014-08-10 Thread Graham Dennis (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092367#comment-14092367
 ] 

Graham Dennis commented on SPARK-2878:
--

Here's the problem as I see it: To use a custom kryo registrator, the 
application jar must be available to the executor JVM.  Currently, the 
application jar isn't added to the classpath on launch, and so needs to be 
added later.  This happens when a task is sent to the executor JVM.  But the 
only reason the executor JVM can deserialise the task is because the closure 
serialiser can be different to the normal object serialiser, and it defaults to 
the Java serialiser.  If you were to try and use the kryo serialiser to 
serialise the closure, you'd have a chicken-and-egg problem: to know what jars 
the task needs, you need to deserialise the task, but to deserialise the task 
you need the application jars that contain the custom kryo registrator.

A similar problem would occur if you tried to set a custom serialiser that only 
existed in the application jar.

So my question is this: is there a reason that the application jar isn't added 
to (the end of) the classpath of the executor JVMs at launch time?  This would 
allow the application jar to contain a custom serialiser and/or a custom kryo 
registrator.  Additional jars can still be added to the executors later, but 
the user can't intend for these to modify the behaviour of the kryo registrator 
(as that would almost certainly lead to inconsistencies).

> Inconsistent Kryo serialisation with custom Kryo Registrator
> 
>
> Key: SPARK-2878
> URL: https://issues.apache.org/jira/browse/SPARK-2878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0, 1.0.2
> Environment: Linux RedHat EL 6, 4-node Spark cluster.
>Reporter: Graham Dennis
>
> The custom Kryo Registrator (a class with the 
> org.apache.spark.serializer.KryoRegistrator trait) is not used with every 
> Kryo instance created, and this causes inconsistent serialisation and 
> deserialisation.
> The Kryo Registrator is sometimes not used because of a ClassNotFound 
> exception that only occurs if it *isn't* the Worker thread (of an Executor) 
> that tries to create the KryoRegistrator.
> A complete description of the problem and a project reproducing the problem 
> can be found at https://github.com/GrahamDennis/spark-kryo-serialisation
> I have currently only tested this with Spark 1.0.0, but will try to test 
> against 1.0.2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2961) Use statistics to skip partitions when reading from in-memory columnar data

2014-08-10 Thread Michael Armbrust (JIRA)

Michael Armbrust created SPARK-2961:
---

 Summary: Use statistics to skip partitions when reading from 
in-memory columnar data
 Key: SPARK-2961
 URL: https://issues.apache.org/jira/browse/SPARK-2961
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2961) Use statistics to skip partitions when reading from in-memory columnar data

2014-08-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2961:


Target Version/s: 1.1.0

> Use statistics to skip partitions when reading from in-memory columnar data
> ---
>
> Key: SPARK-2961
> URL: https://issues.apache.org/jira/browse/SPARK-2961
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2961) Use statistics to skip partitions when reading from in-memory columnar data

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092373#comment-14092373
 ] 

Apache Spark commented on SPARK-2961:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/1883

> Use statistics to skip partitions when reading from in-memory columnar data
> ---
>
> Key: SPARK-2961
> URL: https://issues.apache.org/jira/browse/SPARK-2961
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors

2014-08-10 Thread Xu Zhongxing (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092376#comment-14092376
 ] 

Xu Zhongxing commented on SPARK-2204:
-

I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134


> Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
> --
>
> Key: SPARK-2204
> URL: https://issues.apache.org/jira/browse/SPARK-2204
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Sebastien Rainville
>Assignee: Sebastien Rainville
>Priority: Blocker
> Fix For: 1.0.1, 1.1.0
>
>
> MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is 
> assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning 
> task lists in the same order as the offers it was passed, but in the current 
> implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid 
> assigning the tasks always to the same executors. The result is that the 
> tasks are launched on the wrong executors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2062) VertexRDD.apply does not use the mergeFunc

2014-08-10 Thread Larry Xiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092381#comment-14092381
 ] 

Larry Xiao commented on SPARK-2062:
---

Is anyone working on it? I want to take it.
My plan is to add a pass to do the merge, is it ok? [~ankurd]

> VertexRDD.apply does not use the mergeFunc
> --
>
> Key: SPARK-2062
> URL: https://issues.apache.org/jira/browse/SPARK-2062
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Reporter: Ankur Dave
>Assignee: Ankur Dave
>
> Here: 
> https://github.com/apache/spark/blob/b1feb60209174433262de2a26d39616ba00edcc8/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala#L410



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2936) Migrate Netty network module from Java to Scala

2014-08-10 Thread Aaron Davidson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson resolved SPARK-2936.
---

Resolution: Fixed

> Migrate Netty network module from Java to Scala
> ---
>
> Key: SPARK-2936
> URL: https://issues.apache.org/jira/browse/SPARK-2936
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> The netty network module was originally written when Scala 2.9.x had a bug 
> that prevents a pure Scala implementation, and a subset of the files were 
> done in Java. We have since upgraded to Scala 2.10, and can migrate all Java 
> files now to Scala.
> https://github.com/netty/netty/issues/781
> https://github.com/mesos/spark/pull/522



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2962) Suboptimal scheduling in spark

2014-08-10 Thread Mridul Muralidharan (JIRA)

Mridul Muralidharan created SPARK-2962:
--

 Summary: Suboptimal scheduling in spark
 Key: SPARK-2962
 URL: https://issues.apache.org/jira/browse/SPARK-2962
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: All
Reporter: Mridul Muralidharan



In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are 
always scheduled with PROCESS_LOCAL

pendingTasksWithNoPrefs contains tasks which currently do not have any alive 
locations - but which could come in 'later' : particularly relevant when spark 
app is just coming up and containers are still being added.

This causes a large number of non node local tasks to be scheduled incurring 
significant network transfers in the cluster when running with non trivial 
datasets.

The comment "// Look for no-pref tasks after rack-local tasks since they can 
run anywhere." is misleading in the method code : locality levels start from 
process_local down to any, and so no prefs get scheduled much before rack.


Also note that, currentLocalityIndex is reset to the taskLocality returned by 
this method - so returning PROCESS_LOCAL as the level will trigger wait times 
again. (Was relevant before recent change to scheduler, and might be again 
based on resolution of this issue).


Found as part of writing test for SPARK-2931
 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark

2014-08-10 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092418#comment-14092418
 ] 

Matei Zaharia commented on SPARK-2962:
--

I thought this was fixed in https://github.com/apache/spark/pull/1313. Is that 
not the case?

> Suboptimal scheduling in spark
> --
>
> Key: SPARK-2962
> URL: https://issues.apache.org/jira/browse/SPARK-2962
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
> Environment: All
>Reporter: Mridul Muralidharan
>
> In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs 
> are always scheduled with PROCESS_LOCAL
> pendingTasksWithNoPrefs contains tasks which currently do not have any alive 
> locations - but which could come in 'later' : particularly relevant when 
> spark app is just coming up and containers are still being added.
> This causes a large number of non node local tasks to be scheduled incurring 
> significant network transfers in the cluster when running with non trivial 
> datasets.
> The comment "// Look for no-pref tasks after rack-local tasks since they can 
> run anywhere." is misleading in the method code : locality levels start from 
> process_local down to any, and so no prefs get scheduled much before rack.
> Also note that, currentLocalityIndex is reset to the taskLocality returned by 
> this method - so returning PROCESS_LOCAL as the level will trigger wait times 
> again. (Was relevant before recent change to scheduler, and might be again 
> based on resolution of this issue).
> Found as part of writing test for SPARK-2931
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark

2014-08-10 Thread Mridul Muralidharan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092427#comment-14092427
 ] 

Mridul Muralidharan commented on SPARK-2962:


To give more context; 

a) Our jobs start with load data from dfs as starting point : and so this is 
the first stage that gets executed.

b) We are sleeping for 1 minute before starting the jobs (in case cluster is 
busy, etc) - unfortunately, this is not sufficient and iirc there is no 
programmatic way to wait more deterministically for X% of node (was something 
added to alleviate this ? I did see some discussion)

c) This becomes more of a problem because spark does not honour preferred 
location anymore while running in yarn. See SPARK-208 - due to 1.0 interface 
changes.
[ Practically, if we are using large enough number of nodes (with replication 
of 3 or higher), usually we do end up with quite of lot of data local tasks 
eventually - so (c) is not an immediate concern for our current jobs assuming 
(b) is not an issue, though it is suboptimal in general case ]



> Suboptimal scheduling in spark
> --
>
> Key: SPARK-2962
> URL: https://issues.apache.org/jira/browse/SPARK-2962
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
> Environment: All
>Reporter: Mridul Muralidharan
>
> In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs 
> are always scheduled with PROCESS_LOCAL
> pendingTasksWithNoPrefs contains tasks which currently do not have any alive 
> locations - but which could come in 'later' : particularly relevant when 
> spark app is just coming up and containers are still being added.
> This causes a large number of non node local tasks to be scheduled incurring 
> significant network transfers in the cluster when running with non trivial 
> datasets.
> The comment "// Look for no-pref tasks after rack-local tasks since they can 
> run anywhere." is misleading in the method code : locality levels start from 
> process_local down to any, and so no prefs get scheduled much before rack.
> Also note that, currentLocalityIndex is reset to the taskLocality returned by 
> this method - so returning PROCESS_LOCAL as the level will trigger wait times 
> again. (Was relevant before recent change to scheduler, and might be again 
> based on resolution of this issue).
> Found as part of writing test for SPARK-2931
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark

2014-08-10 Thread Mridul Muralidharan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092430#comment-14092430
 ] 

Mridul Muralidharan commented on SPARK-2962:


Hi [~matei],

  I am referencing the latest code (as of yday night).

pendingTasksWithNoPrefs currnetly contains both tasks which truely have no 
preference, and tasks which have preference which are unavailble - and the 
latter is what is triggering this, since that can change during the execution 
of the stage.
Hope I am not missing something ?

Thanks,
Mridul

> Suboptimal scheduling in spark
> --
>
> Key: SPARK-2962
> URL: https://issues.apache.org/jira/browse/SPARK-2962
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
> Environment: All
>Reporter: Mridul Muralidharan
>
> In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs 
> are always scheduled with PROCESS_LOCAL
> pendingTasksWithNoPrefs contains tasks which currently do not have any alive 
> locations - but which could come in 'later' : particularly relevant when 
> spark app is just coming up and containers are still being added.
> This causes a large number of non node local tasks to be scheduled incurring 
> significant network transfers in the cluster when running with non trivial 
> datasets.
> The comment "// Look for no-pref tasks after rack-local tasks since they can 
> run anywhere." is misleading in the method code : locality levels start from 
> process_local down to any, and so no prefs get scheduled much before rack.
> Also note that, currentLocalityIndex is reset to the taskLocality returned by 
> this method - so returning PROCESS_LOCAL as the level will trigger wait times 
> again. (Was relevant before recent change to scheduler, and might be again 
> based on resolution of this issue).
> Found as part of writing test for SPARK-2931
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark

2014-08-10 Thread Mridul Muralidharan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092431#comment-14092431
 ] 

Mridul Muralidharan commented on SPARK-2962:


Note, I dont think this is a regression in 1.1, and probably existed much 
earlier too.
Other issues are making us notice this (like SPARK-2089) - we moved to 1.1 from 
0.9 recently.

> Suboptimal scheduling in spark
> --
>
> Key: SPARK-2962
> URL: https://issues.apache.org/jira/browse/SPARK-2962
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
> Environment: All
>Reporter: Mridul Muralidharan
>
> In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs 
> are always scheduled with PROCESS_LOCAL
> pendingTasksWithNoPrefs contains tasks which currently do not have any alive 
> locations - but which could come in 'later' : particularly relevant when 
> spark app is just coming up and containers are still being added.
> This causes a large number of non node local tasks to be scheduled incurring 
> significant network transfers in the cluster when running with non trivial 
> datasets.
> The comment "// Look for no-pref tasks after rack-local tasks since they can 
> run anywhere." is misleading in the method code : locality levels start from 
> process_local down to any, and so no prefs get scheduled much before rack.
> Also note that, currentLocalityIndex is reset to the taskLocality returned by 
> this method - so returning PROCESS_LOCAL as the level will trigger wait times 
> again. (Was relevant before recent change to scheduler, and might be again 
> based on resolution of this issue).
> Found as part of writing test for SPARK-2931
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2962) Suboptimal scheduling in spark

2014-08-10 Thread Mridul Muralidharan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092427#comment-14092427
 ] 

Mridul Muralidharan edited comment on SPARK-2962 at 8/11/14 4:35 AM:
-

To give more context; 

a) Our jobs start with load data from dfs as starting point : and so this is 
the first stage that gets executed.

b) We are sleeping for 1 minute before starting the jobs (in case cluster is 
busy, etc) - unfortunately, this is not sufficient and iirc there is no 
programmatic way to wait more deterministically for X% of node (was something 
added to alleviate this ? I did see some discussion)

c) This becomes more of a problem because spark does not honour preferred 
location anymore while running in yarn. See SPARK-2089 - due to 1.0 interface 
changes.
[ Practically, if we are using large enough number of nodes (with replication 
of 3 or higher), usually we do end up with quite of lot of data local tasks 
eventually - so (c) is not an immediate concern for our current jobs assuming 
(b) is not an issue, though it is suboptimal in general case ]




was (Author: mridulm80):
To give more context; 

a) Our jobs start with load data from dfs as starting point : and so this is 
the first stage that gets executed.

b) We are sleeping for 1 minute before starting the jobs (in case cluster is 
busy, etc) - unfortunately, this is not sufficient and iirc there is no 
programmatic way to wait more deterministically for X% of node (was something 
added to alleviate this ? I did see some discussion)

c) This becomes more of a problem because spark does not honour preferred 
location anymore while running in yarn. See SPARK-208 - due to 1.0 interface 
changes.
[ Practically, if we are using large enough number of nodes (with replication 
of 3 or higher), usually we do end up with quite of lot of data local tasks 
eventually - so (c) is not an immediate concern for our current jobs assuming 
(b) is not an issue, though it is suboptimal in general case ]



> Suboptimal scheduling in spark
> --
>
> Key: SPARK-2962
> URL: https://issues.apache.org/jira/browse/SPARK-2962
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
> Environment: All
>Reporter: Mridul Muralidharan
>
> In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs 
> are always scheduled with PROCESS_LOCAL
> pendingTasksWithNoPrefs contains tasks which currently do not have any alive 
> locations - but which could come in 'later' : particularly relevant when 
> spark app is just coming up and containers are still being added.
> This causes a large number of non node local tasks to be scheduled incurring 
> significant network transfers in the cluster when running with non trivial 
> datasets.
> The comment "// Look for no-pref tasks after rack-local tasks since they can 
> run anywhere." is misleading in the method code : locality levels start from 
> process_local down to any, and so no prefs get scheduled much before rack.
> Also note that, currentLocalityIndex is reset to the taskLocality returned by 
> this method - so returning PROCESS_LOCAL as the level will trigger wait times 
> again. (Was relevant before recent change to scheduler, and might be again 
> based on resolution of this issue).
> Found as part of writing test for SPARK-2931
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2912) Jenkins should include the commit hash in his messages

2014-08-10 Thread Michael Yannakopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092434#comment-14092434
 ] 

Michael Yannakopoulos commented on SPARK-2912:
--

Hi Nicholas,

I can work on this issue!

Thanks,
Michael

> Jenkins should include the commit hash in his messages
> --
>
> Key: SPARK-2912
> URL: https://issues.apache.org/jira/browse/SPARK-2912
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Nicholas Chammas
>
> When there are multiple test cycles within a PR, it is not obvious what cycle 
> applies to what set of changes. This makes it more likely for committers to 
> merge a PR that has had new commits added since the last PR.
> Requirements:
> * Add the commit hash to Jenkins's messages so it's clear what the test cycle 
> corresponds to.
> * While you're at it, polish the formatting a bit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2912) Jenkins should include the commit hash in his messages

2014-08-10 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2912:
---

Assignee: Nicholas Chammas

> Jenkins should include the commit hash in his messages
> --
>
> Key: SPARK-2912
> URL: https://issues.apache.org/jira/browse/SPARK-2912
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> When there are multiple test cycles within a PR, it is not obvious what cycle 
> applies to what set of changes. This makes it more likely for committers to 
> merge a PR that has had new commits added since the last PR.
> Requirements:
> * Add the commit hash to Jenkins's messages so it's clear what the test cycle 
> corresponds to.
> * While you're at it, polish the formatting a bit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2912) Jenkins should include the commit hash in his messages

2014-08-10 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092435#comment-14092435
 ] 

Patrick Wendell commented on SPARK-2912:


Hey Michael - I believe [~nchammas] is already working on it actually, so I 
assigned him. 

> Jenkins should include the commit hash in his messages
> --
>
> Key: SPARK-2912
> URL: https://issues.apache.org/jira/browse/SPARK-2912
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> When there are multiple test cycles within a PR, it is not obvious what cycle 
> applies to what set of changes. This makes it more likely for committers to 
> merge a PR that has had new commits added since the last PR.
> Requirements:
> * Add the commit hash to Jenkins's messages so it's clear what the test cycle 
> corresponds to.
> * While you're at it, polish the formatting a bit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2912) Jenkins should include the commit hash in his messages

2014-08-10 Thread Michael Yannakopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092437#comment-14092437
 ] 

Michael Yannakopoulos commented on SPARK-2912:
--

Thanks for the quick reply Patrick! Nice, I will try to find another open issue 
so as to resolve it.

> Jenkins should include the commit hash in his messages
> --
>
> Key: SPARK-2912
> URL: https://issues.apache.org/jira/browse/SPARK-2912
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> When there are multiple test cycles within a PR, it is not obvious what cycle 
> applies to what set of changes. This makes it more likely for committers to 
> merge a PR that has had new commits added since the last PR.
> Requirements:
> * Add the commit hash to Jenkins's messages so it's clear what the test cycle 
> corresponds to.
> * While you're at it, polish the formatting a bit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2912) Jenkins should include the commit hash in his messages

2014-08-10 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092446#comment-14092446
 ] 

Nicholas Chammas commented on SPARK-2912:
-

[~pwendell]: I'll try to get this wrapped up Monday. I'll make another set of 
changes to the message text per our last discussion and, assuming it's good, 
clean up the commit history so it can get merged in.

> Jenkins should include the commit hash in his messages
> --
>
> Key: SPARK-2912
> URL: https://issues.apache.org/jira/browse/SPARK-2912
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> When there are multiple test cycles within a PR, it is not obvious what cycle 
> applies to what set of changes. This makes it more likely for committers to 
> merge a PR that has had new commits added since the last PR.
> Requirements:
> * Add the commit hash to Jenkins's messages so it's clear what the test cycle 
> corresponds to.
> * While you're at it, polish the formatting a bit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2963) There no documentation for building about SparkSQL

2014-08-10 Thread Kousuke Saruta (JIRA)

Kousuke Saruta created SPARK-2963:
-

 Summary: There no documentation for building about SparkSQL
 Key: SPARK-2963
 URL: https://issues.apache.org/jira/browse/SPARK-2963
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Kousuke Saruta


Currently, if we'd like to use SparkSQL, we need to use -Phive-thriftserver 
option on building but it's implicit.
I think we need to describe how to build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2963) There no documentation for building about SparkSQL

2014-08-10 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-2963:
--

Description: 
Currently, if we'd like to use ThriftServer or CLI for SparkSQL, we need to use 
-Phive-thriftserver option on building but it's implicit.
I think we need to describe how to build.

  was:
Currently, if we'd like to use SparkSQL, we need to use -Phive-thriftserver 
option on building but it's implicit.
I think we need to describe how to build.


> There no documentation for building about SparkSQL
> --
>
> Key: SPARK-2963
> URL: https://issues.apache.org/jira/browse/SPARK-2963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>
> Currently, if we'd like to use ThriftServer or CLI for SparkSQL, we need to 
> use -Phive-thriftserver option on building but it's implicit.
> I think we need to describe how to build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2963) There no documentation about building ThriftServer and CLI for SparkSQL

2014-08-10 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-2963:
--

Summary: There no documentation about building ThriftServer and CLI for 
SparkSQL  (was: There no documentation for building about SparkSQL)

> There no documentation about building ThriftServer and CLI for SparkSQL
> ---
>
> Key: SPARK-2963
> URL: https://issues.apache.org/jira/browse/SPARK-2963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>
> Currently, if we'd like to use ThriftServer or CLI for SparkSQL, we need to 
> use -Phive-thriftserver option on building but it's implicit.
> I think we need to describe how to build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2963) There no documentation about building to use HiveServer and CLI for SparkSQL

2014-08-10 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-2963:
--

Summary: There no documentation about building to use HiveServer and CLI 
for SparkSQL  (was: There no documentation about building ThriftServer and CLI 
for SparkSQL)

> There no documentation about building to use HiveServer and CLI for SparkSQL
> 
>
> Key: SPARK-2963
> URL: https://issues.apache.org/jira/browse/SPARK-2963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>
> Currently, if we'd like to use ThriftServer or CLI for SparkSQL, we need to 
> use -Phive-thriftserver option on building but it's implicit.
> I think we need to describe how to build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2963) There no documentation about building to use HiveServer and CLI for SparkSQL

2014-08-10 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-2963:
--

Description: 
Currently, if we'd like to use HiveServer or CLI for SparkSQL, we need to use 
-Phive-thriftserver option when building but it's implicit.
I think we need to describe how to build.

  was:
Currently, if we'd like to use ThriftServer or CLI for SparkSQL, we need to use 
-Phive-thriftserver option on building but it's implicit.
I think we need to describe how to build.


> There no documentation about building to use HiveServer and CLI for SparkSQL
> 
>
> Key: SPARK-2963
> URL: https://issues.apache.org/jira/browse/SPARK-2963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>
> Currently, if we'd like to use HiveServer or CLI for SparkSQL, we need to use 
> -Phive-thriftserver option when building but it's implicit.
> I think we need to describe how to build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2963) There no documentation about building to use HiveServer and CLI for SparkSQL

2014-08-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092490#comment-14092490
 ] 

Apache Spark commented on SPARK-2963:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/1885

> There no documentation about building to use HiveServer and CLI for SparkSQL
> 
>
> Key: SPARK-2963
> URL: https://issues.apache.org/jira/browse/SPARK-2963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>
> Currently, if we'd like to use HiveServer or CLI for SparkSQL, we need to use 
> -Phive-thriftserver option when building but it's implicit.
> I think we need to describe how to build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors

2014-08-10 Thread Xu Zhongxing (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092376#comment-14092376
 ] 

Xu Zhongxing edited comment on SPARK-2204 at 8/11/14 6:49 AM:
--

I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

Maybe this is not fixed on some failure/exception paths.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134



was (Author: xuzhongxing):
I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134


> Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
> --
>
> Key: SPARK-2204
> URL: https://issues.apache.org/jira/browse/SPARK-2204
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Sebastien Rainville
>Assignee: Sebastien Rainville
>Priority: Blocker
> Fix For: 1.0.1, 1.1.0
>
>
> MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is 
> assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning 
> task lists in the same order as the offers it was

[jira] [Created] (SPARK-2964) Wrong silent option in spark-sql script

2014-08-10 Thread Kousuke Saruta (JIRA)

Kousuke Saruta created SPARK-2964:
-

 Summary: Wrong silent option in spark-sql script
 Key: SPARK-2964
 URL: https://issues.apache.org/jira/browse/SPARK-2964
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Kousuke Saruta
Priority: Minor


In spark-sql script, -s option is handled as silent option but 
org.apache.hadoop.hive.cli.OptionProcessor interpret -S (large character) as 
silent mode option.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

70 matches

Mail list logo