[jira] [Updated] (SPARK-2953) Allow using short names for io compression codecs
[ https://issues.apache.org/jira/browse/SPARK-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2953: --- Issue Type: Improvement (was: Bug) Allow using short names for io compression codecs - Key: SPARK-2953 URL: https://issues.apache.org/jira/browse/SPARK-2953 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Instead of requiring org.apache.spark.io.LZ4CompressionCodec, it is easier to just accept lz4, lzf, snappy. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2953) Allow using short names for io compression codecs
[ https://issues.apache.org/jira/browse/SPARK-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2953: --- Component/s: Spark Core Allow using short names for io compression codecs - Key: SPARK-2953 URL: https://issues.apache.org/jira/browse/SPARK-2953 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Instead of requiring org.apache.spark.io.LZ4CompressionCodec, it is easier to just accept lz4, lzf, snappy. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2953) Allow using short names for io compression codecs
[ https://issues.apache.org/jira/browse/SPARK-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2953: --- Description: Instead of requiring org.apache.spark.io.LZ4CompressionCodec, it is easier for users if Spark just accepts lz4, lzf, snappy. (was: Instead of requiring org.apache.spark.io.LZ4CompressionCodec, it is easier to just accept lz4, lzf, snappy.) Allow using short names for io compression codecs - Key: SPARK-2953 URL: https://issues.apache.org/jira/browse/SPARK-2953 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Instead of requiring org.apache.spark.io.LZ4CompressionCodec, it is easier for users if Spark just accepts lz4, lzf, snappy. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2953) Allow using short names for io compression codecs
[ https://issues.apache.org/jira/browse/SPARK-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092034#comment-14092034 ] Apache Spark commented on SPARK-2953: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/1873 Allow using short names for io compression codecs - Key: SPARK-2953 URL: https://issues.apache.org/jira/browse/SPARK-2953 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Instead of requiring org.apache.spark.io.LZ4CompressionCodec, it is easier for users if Spark just accepts lz4, lzf, snappy. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2947) DAGScheduler scheduling infinite loop
[ https://issues.apache.org/jira/browse/SPARK-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-2947: --- Summary: DAGScheduler scheduling infinite loop (was: DAGScheduler scheduling dead cycle) DAGScheduler scheduling infinite loop - Key: SPARK-2947 URL: https://issues.apache.org/jira/browse/SPARK-2947 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.0.2 Reporter: Guoqiang Li Priority: Blocker Fix For: 1.1.0, 1.0.3 Stage to resubmit more than 5 times. This seems to be caused by {{FetchFailed.bmAddress}} is null . I don't know how to reproduce it. master log: {noformat} 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as TID 52334 on executor 82: sanshan (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 3060 bytes in 0 ms 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as TID 52335 on executor 78: tuan231 (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 3060 bytes in 0 ms 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 1.189:141) 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission -- 5 times --- 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.189, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 ms on jilin (progress: 280/280) 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 269) 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 2.1, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at DealCF.scala:207) finished in 129.544 s {noformat} worker: log {noformat} /1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18017 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18285 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18419 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09
[jira] [Created] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6
Josh Rosen created SPARK-2954: - Summary: PySpark MLlib serialization tests fail on Python 2.6 Key: SPARK-2954 URL: https://issues.apache.org/jira/browse/SPARK-2954 Project: Spark Issue Type: Bug Affects Versions: 1.1.0 Reporter: Josh Rosen The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking data from bytearray using struct.unpack: {code} ** File pyspark/mllib/_common.py, line 181, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(1L)) == 1.0 Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[4], line 1, in module _deserialize_double(_serialize_double(1L)) == 1.0 File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 184, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == x Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[6], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == x File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 187, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == y Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[8], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == y File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** {code} It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: http://stackoverflow.com/a/15467046/590203 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-2954: - Assignee: Josh Rosen PySpark MLlib serialization tests fail on Python 2.6 Key: SPARK-2954 URL: https://issues.apache.org/jira/browse/SPARK-2954 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Josh Rosen Assignee: Josh Rosen The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking data from bytearray using struct.unpack: {code} ** File pyspark/mllib/_common.py, line 181, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(1L)) == 1.0 Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[4], line 1, in module _deserialize_double(_serialize_double(1L)) == 1.0 File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 184, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == x Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[6], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == x File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 187, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == y Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[8], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == y File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** {code} It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: http://stackoverflow.com/a/15467046/590203 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-2954: -- Component/s: PySpark PySpark MLlib serialization tests fail on Python 2.6 Key: SPARK-2954 URL: https://issues.apache.org/jira/browse/SPARK-2954 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Josh Rosen The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking data from bytearray using struct.unpack: {code} ** File pyspark/mllib/_common.py, line 181, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(1L)) == 1.0 Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[4], line 1, in module _deserialize_double(_serialize_double(1L)) == 1.0 File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 184, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == x Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[6], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == x File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 187, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == y Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[8], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == y File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** {code} It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: http://stackoverflow.com/a/15467046/590203 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2954) PySpark MLlib serialization tests fail on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092044#comment-14092044 ] Apache Spark commented on SPARK-2954: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/1874 PySpark MLlib serialization tests fail on Python 2.6 Key: SPARK-2954 URL: https://issues.apache.org/jira/browse/SPARK-2954 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Josh Rosen Assignee: Josh Rosen The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking data from bytearray using struct.unpack: {code} ** File pyspark/mllib/_common.py, line 181, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(1L)) == 1.0 Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[4], line 1, in module _deserialize_double(_serialize_double(1L)) == 1.0 File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 184, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == x Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[6], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == x File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** File pyspark/mllib/_common.py, line 187, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == y Exception raised: Traceback (most recent call last): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py, line 1253, in __run compileflags, 1) in test.globs File doctest __main__._deserialize_double[8], line 1, in module _deserialize_double(_serialize_double(sys.float_info.max)) == y File pyspark/mllib/_common.py, line 194, in _deserialize_double return struct.unpack(d, ba[offset:])[0] error: unpack requires a string argument of length 8 ** {code} It looks like one solution is to wrap the {{bytearray}} with {{buffer()}}: http://stackoverflow.com/a/15467046/590203 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2948) PySpark doesn't work on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092045#comment-14092045 ] Apache Spark commented on SPARK-2948: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/1874 PySpark doesn't work on Python 2.6 -- Key: SPARK-2948 URL: https://issues.apache.org/jira/browse/SPARK-2948 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: CentOS 6.5 / Python 2.6.6 Reporter: Kousuke Saruta Priority: Blocker In serializser.py, collections.namedtuple is redefined as follows. {code} def namedtuple(name, fields, verbose=False, rename=False): cls = _old_namedtuple(name, fields, verbose, rename) return _hack_namedtuple(cls) {code} The number of arguments is 4 but the number of arguments of namedtuple for Python 2.6 is 3 so mismatch is occurred. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-2948) PySpark doesn't work on Python 2.6
[ https://issues.apache.org/jira/browse/SPARK-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-2948: - Assignee: Josh Rosen PySpark doesn't work on Python 2.6 -- Key: SPARK-2948 URL: https://issues.apache.org/jira/browse/SPARK-2948 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: CentOS 6.5 / Python 2.6.6 Reporter: Kousuke Saruta Assignee: Josh Rosen Priority: Blocker In serializser.py, collections.namedtuple is redefined as follows. {code} def namedtuple(name, fields, verbose=False, rename=False): cls = _old_namedtuple(name, fields, verbose, rename) return _hack_namedtuple(cls) {code} The number of arguments is 4 but the number of arguments of namedtuple for Python 2.6 is 3 so mismatch is occurred. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2101) Python unit tests fail on Python 2.6 because of lack of unittest.skipIf()
[ https://issues.apache.org/jira/browse/SPARK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092048#comment-14092048 ] Apache Spark commented on SPARK-2101: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/1874 Python unit tests fail on Python 2.6 because of lack of unittest.skipIf() - Key: SPARK-2101 URL: https://issues.apache.org/jira/browse/SPARK-2101 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.0 Reporter: Uri Laserson Assignee: Josh Rosen PySpark tests fail with Python 2.6 because they currently depend on {{unittest.skipIf}}, which was only introduced in Python 2.7. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2910) Test with Python 2.6 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092046#comment-14092046 ] Apache Spark commented on SPARK-2910: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/1874 Test with Python 2.6 on Jenkins --- Key: SPARK-2910 URL: https://issues.apache.org/jira/browse/SPARK-2910 Project: Spark Issue Type: Improvement Components: Project Infra, PySpark Reporter: Josh Rosen As long as we continue to support Python 2.6 in PySpark, Jenkins should test with Python 2.6. We could downgrade the system Python to 2.6, but it might be easier / cleaner to install 2.6 alongside the current Python and {{export PYSPARK_PYTHON=python2.6}} in the test runner script. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-2910) Test with Python 2.6 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-2910: - Assignee: Josh Rosen Test with Python 2.6 on Jenkins --- Key: SPARK-2910 URL: https://issues.apache.org/jira/browse/SPARK-2910 Project: Spark Issue Type: Improvement Components: Project Infra, PySpark Reporter: Josh Rosen Assignee: Josh Rosen As long as we continue to support Python 2.6 in PySpark, Jenkins should test with Python 2.6. We could downgrade the system Python to 2.6, but it might be easier / cleaner to install 2.6 alongside the current Python and {{export PYSPARK_PYTHON=python2.6}} in the test runner script. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2945) Allow specifying num of executors in the context configuration
[ https://issues.apache.org/jira/browse/SPARK-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092058#comment-14092058 ] Shay Rojansky commented on SPARK-2945: -- I just did a quick test on Spark 1.0.2, and spark.executor.instances does indeed appear to control the number of executors allocated (at least in YARN). Should I keep this open for you guys to take a look and update the docs? Allow specifying num of executors in the context configuration -- Key: SPARK-2945 URL: https://issues.apache.org/jira/browse/SPARK-2945 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Affects Versions: 1.0.0 Environment: Ubuntu precise, on YARN (CDH 5.1.0) Reporter: Shay Rojansky Running on YARN, the only way to specify the number of executors seems to be on the command line of spark-submit, via the --num-executors switch. In many cases this is too early. Our Spark app receives some cmdline arguments which determine the amount of work that needs to be done - and that affects the number of executors it ideally requires. Ideally, the Spark context configuration would support specifying this like any other config param. Our current workaround is a wrapper script that determines how much work is needed, and which itself launches spark-submit with the number passed to --num-executors - it's a shame to have to do this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2947) DAGScheduler resubmit the stage into an infinite loop
[ https://issues.apache.org/jira/browse/SPARK-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-2947: --- Summary: DAGScheduler resubmit the stage into an infinite loop (was: DAGScheduler resubmit the task into an infinite loop) DAGScheduler resubmit the stage into an infinite loop - Key: SPARK-2947 URL: https://issues.apache.org/jira/browse/SPARK-2947 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.0.2 Reporter: Guoqiang Li Priority: Blocker Fix For: 1.1.0, 1.0.3 Stage to resubmit more than 5 times. This seems to be caused by {{FetchFailed.bmAddress}} is null . I don't know how to reproduce it. master log: {noformat} 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as TID 52334 on executor 82: sanshan (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 3060 bytes in 0 ms 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as TID 52335 on executor 78: tuan231 (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 3060 bytes in 0 ms 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 1.189:141) 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission -- 5 times --- 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.189, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 ms on jilin (progress: 280/280) 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 269) 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 2.1, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at DealCF.scala:207) finished in 129.544 s {noformat} worker: log {noformat} /1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18017 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18285 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18419 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO
[jira] [Updated] (SPARK-2947) DAGScheduler resubmit the task into an infinite loop
[ https://issues.apache.org/jira/browse/SPARK-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-2947: --- Summary: DAGScheduler resubmit the task into an infinite loop (was: DAGScheduler scheduling infinite loop) DAGScheduler resubmit the task into an infinite loop Key: SPARK-2947 URL: https://issues.apache.org/jira/browse/SPARK-2947 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.0.2 Reporter: Guoqiang Li Priority: Blocker Fix For: 1.1.0, 1.0.3 Stage to resubmit more than 5 times. This seems to be caused by {{FetchFailed.bmAddress}} is null . I don't know how to reproduce it. master log: {noformat} 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as TID 52334 on executor 82: sanshan (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 3060 bytes in 0 ms 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as TID 52335 on executor 78: tuan231 (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 3060 bytes in 0 ms 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 1.189:141) 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission -- 5 times --- 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.189, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 ms on jilin (progress: 280/280) 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 269) 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 2.1, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at DealCF.scala:207) finished in 129.544 s {noformat} worker: log {noformat} /1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18017 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18285 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18419 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO
[jira] [Commented] (SPARK-2947) DAGScheduler resubmit the stage into an infinite loop
[ https://issues.apache.org/jira/browse/SPARK-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092100#comment-14092100 ] Apache Spark commented on SPARK-2947: - User 'witgo' has created a pull request for this issue: https://github.com/apache/spark/pull/1877 DAGScheduler resubmit the stage into an infinite loop - Key: SPARK-2947 URL: https://issues.apache.org/jira/browse/SPARK-2947 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.0.2 Reporter: Guoqiang Li Priority: Blocker Fix For: 1.1.0, 1.0.3 Stage to resubmit more than 5 times. This seems to be caused by {{FetchFailed.bmAddress}} is null . I don't know how to reproduce it. master log: {noformat} 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as TID 52334 on executor 82: sanshan (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 3060 bytes in 0 ms 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as TID 52335 on executor 78: tuan231 (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 3060 bytes in 0 ms 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 1.189:141) 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission -- 5 times --- 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.189, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 ms on jilin (progress: 280/280) 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 269) 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 2.1, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at DealCF.scala:207) finished in 129.544 s {noformat} worker: log {noformat} /1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18017 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18285 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18419 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2
[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-1297: -- Attachment: spark-1297-v2.txt Tentative patch adds hbase-hadoop2 profile. Upgrade HBase dependency to 0.98.0 -- Key: SPARK-1297 URL: https://issues.apache.org/jira/browse/SPARK-1297 Project: Spark Issue Type: Task Reporter: Ted Yu Priority: Minor Attachments: spark-1297-v2.txt HBase 0.94.6 was released 11 months ago. Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-1297: -- Attachment: (was: spark-1297-v2.txt) Upgrade HBase dependency to 0.98.0 -- Key: SPARK-1297 URL: https://issues.apache.org/jira/browse/SPARK-1297 Project: Spark Issue Type: Task Reporter: Ted Yu Priority: Minor HBase 0.94.6 was released 11 months ago. Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-1297: -- Attachment: spark-1297-v2.txt Upgrade HBase dependency to 0.98.0 -- Key: SPARK-1297 URL: https://issues.apache.org/jira/browse/SPARK-1297 Project: Spark Issue Type: Task Reporter: Ted Yu Priority: Minor Attachments: spark-1297-v2.txt HBase 0.94.6 was released 11 months ago. Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092115#comment-14092115 ] Sean Owen commented on SPARK-1297: -- This doesn't work with Hadoop 1 though. It also requires turning on an HBase profile for every build. See my comments above; I think this can be made friendlier with more work in the profiles. I think it requires a hadoop1 profile to really solve this kind of problem for every components, not just HBase. Upgrade HBase dependency to 0.98.0 -- Key: SPARK-1297 URL: https://issues.apache.org/jira/browse/SPARK-1297 Project: Spark Issue Type: Task Reporter: Ted Yu Priority: Minor Attachments: spark-1297-v2.txt HBase 0.94.6 was released 11 months ago. Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2944) sc.makeRDD doesn't distribute partitions evenly
[ https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092135#comment-14092135 ] Xiangrui Meng commented on SPARK-2944: -- Found that this behavior is not deterministic. So it is hard to tell which commit introduces it now. It seems that it happens when tasks are very small. Some workers may get a lot more assignments than others because they finishes the tasks very quickly and TaskSetManager always picks the first available one. (There are no randomization in `TaskSetManager`.) sc.makeRDD doesn't distribute partitions evenly --- Key: SPARK-2944 URL: https://issues.apache.org/jira/browse/SPARK-2944 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical 16 nodes EC2 cluster: {code} val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache() rdd.count() {code} Saw 156 partitions on one node while only 8 partitions on another. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2950) Add GC time and Shuffle Write time to JobLogger output
[ https://issues.apache.org/jira/browse/SPARK-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-2950: - Fix Version/s: 1.2.0 Add GC time and Shuffle Write time to JobLogger output -- Key: SPARK-2950 URL: https://issues.apache.org/jira/browse/SPARK-2950 Project: Spark Issue Type: Improvement Reporter: Shivaram Venkataraman Assignee: Shivaram Venkataraman Priority: Minor Fix For: 1.2.0 The JobLogger is very useful for performing offline performance profiling of Spark jobs. GC Time and Shuffle Write time are available in TaskMetrics but are currently missed from the JobLogger output. This change adds these two fields. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2950) Add GC time and Shuffle Write time to JobLogger output
[ https://issues.apache.org/jira/browse/SPARK-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-2950. -- Resolution: Fixed Add GC time and Shuffle Write time to JobLogger output -- Key: SPARK-2950 URL: https://issues.apache.org/jira/browse/SPARK-2950 Project: Spark Issue Type: Improvement Reporter: Shivaram Venkataraman Assignee: Shivaram Venkataraman Priority: Minor Fix For: 1.2.0 The JobLogger is very useful for performing offline performance profiling of Spark jobs. GC Time and Shuffle Write time are available in TaskMetrics but are currently missed from the JobLogger output. This change adds these two fields. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2898) Failed to connect to daemon
[ https://issues.apache.org/jira/browse/SPARK-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-2898. --- Resolution: Fixed Fix Version/s: 1.1.0 Failed to connect to daemon --- Key: SPARK-2898 URL: https://issues.apache.org/jira/browse/SPARK-2898 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Reporter: Davies Liu Assignee: Davies Liu Fix For: 1.1.0 There is a deadlock in handle_sigchld() because of logging Java options: -Dspark.storage.memoryFraction=0.66 -Dspark.serializer=org.apache.spark.serializer.JavaSerializer -Dspark.executor.memory=3g -Dspark.locality.wait=6000 Options: SchedulerThroughputTest --num-tasks=1 --num-trials=4 --inter-trial-wait=1 14/08/06 22:09:41 WARN JettyUtils: Failed to create UI on port 4040. Trying again on port 4041. - Failure(java.net.BindException: Address already in use) worker 50114 crashed abruptly with exit status 1 14/08/06 22:10:37 ERROR Executor: Exception in task 1476.0 in stage 1.0 (TID 11476) org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:150) at org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:154) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:101) ... 10 more 14/08/06 22:10:37 WARN PythonWorkerFactory: Failed to open socket to Python daemon: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.init(Socket.java:425) at java.net.Socket.init(Socket.java:241) at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:68) at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:83) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:82) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:55) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:101) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/08/06 22:10:37 ERROR Executor: Exception in task 1478.0 in stage 1.0 (TID 11478) java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:69) at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:83) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:82) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:55) at
[jira] [Created] (SPARK-2955) Test code fails to compile with mvn compile without install
Sean Owen created SPARK-2955: Summary: Test code fails to compile with mvn compile without install Key: SPARK-2955 URL: https://issues.apache.org/jira/browse/SPARK-2955 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Priority: Minor (This is the corrected follow-up to https://issues.apache.org/jira/browse/SPARK-2903 ) Right now, mvn compile test-compile fails to compile Spark. (Don't worry; mvn package works, so this is not major.) The issue stems from test code in some modules depending on test code in other modules. That is perfectly fine and supported by Maven. It takes extra work to get this to work with scalatest, and this has been attempted: https://github.com/apache/spark/blob/master/sql/catalyst/pom.xml#L86 This formulation is not quite enough, since the SQL Core module's tests fail to compile for lack of finding test classes in SQL Catalyst, and likewise for most Streaming integration modules depending on core Streaming test code. Example: {code} [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:23: not found: type PlanTest [error] class QueryTest extends PlanTest { [error] ^ [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:28: package org.apache.spark.sql.test is not a value [error] test(SPARK-1669: cacheTable should be idempotent) { [error] ^ ... {code} The issue I believe is that generation of a test-jar is bound here to the compile phase, but the test classes are not being compiled in this phase. It should bind to the test-compile phase. It works when executing mvn package or mvn install since test-jar artifacts are actually generated available through normal Maven mechanisms as each module is built. They are then found normally, regardless of scalatest configuration. It would be nice for a simple mvn compile test-compile to work since the test code is perfectly compilable given the Maven declarations. On the plus side, this change is low-risk as it only affects tests. [~yhuai] made the original scalatest change and has glanced at this and thinks it makes sense. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2955) Test code fails to compile with mvn compile without install
[ https://issues.apache.org/jira/browse/SPARK-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092180#comment-14092180 ] Apache Spark commented on SPARK-2955: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/1879 Test code fails to compile with mvn compile without install Key: SPARK-2955 URL: https://issues.apache.org/jira/browse/SPARK-2955 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Sean Owen Priority: Minor Labels: build, compile, scalatest, test, test-compile (This is the corrected follow-up to https://issues.apache.org/jira/browse/SPARK-2903 ) Right now, mvn compile test-compile fails to compile Spark. (Don't worry; mvn package works, so this is not major.) The issue stems from test code in some modules depending on test code in other modules. That is perfectly fine and supported by Maven. It takes extra work to get this to work with scalatest, and this has been attempted: https://github.com/apache/spark/blob/master/sql/catalyst/pom.xml#L86 This formulation is not quite enough, since the SQL Core module's tests fail to compile for lack of finding test classes in SQL Catalyst, and likewise for most Streaming integration modules depending on core Streaming test code. Example: {code} [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:23: not found: type PlanTest [error] class QueryTest extends PlanTest { [error] ^ [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:28: package org.apache.spark.sql.test is not a value [error] test(SPARK-1669: cacheTable should be idempotent) { [error] ^ ... {code} The issue I believe is that generation of a test-jar is bound here to the compile phase, but the test classes are not being compiled in this phase. It should bind to the test-compile phase. It works when executing mvn package or mvn install since test-jar artifacts are actually generated available through normal Maven mechanisms as each module is built. They are then found normally, regardless of scalatest configuration. It would be nice for a simple mvn compile test-compile to work since the test code is perfectly compilable given the Maven declarations. On the plus side, this change is low-risk as it only affects tests. [~yhuai] made the original scalatest change and has glanced at this and thinks it makes sense. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2650) Caching tables larger than memory causes OOMs
[ https://issues.apache.org/jira/browse/SPARK-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2650: Summary: Caching tables larger than memory causes OOMs (was: Wrong initial sizes for in-memory column buffers) Caching tables larger than memory causes OOMs - Key: SPARK-2650 URL: https://issues.apache.org/jira/browse/SPARK-2650 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0, 1.0.1 Reporter: Michael Armbrust Assignee: Cheng Lian Priority: Critical The logic for setting up the initial column buffers is different for Spark SQL compared to Shark and I'm seeing OOMs when caching tables that are larger than available memory (where shark was okay). Two suspicious things: the intialSize is always set to 0 so we always go with the default. The default looks like it was copied from code like 10 * 1024 * 1024... but in Spark SQL its 10 * 102 * 1024. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2650) Caching tables larger than memory causes OOMs
[ https://issues.apache.org/jira/browse/SPARK-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092209#comment-14092209 ] Apache Spark commented on SPARK-2650: - User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/1880 Caching tables larger than memory causes OOMs - Key: SPARK-2650 URL: https://issues.apache.org/jira/browse/SPARK-2650 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0, 1.0.1 Reporter: Michael Armbrust Assignee: Cheng Lian Priority: Critical The logic for setting up the initial column buffers is different for Spark SQL compared to Shark and I'm seeing OOMs when caching tables that are larger than available memory (where shark was okay). Two suspicious things: the intialSize is always set to 0 so we always go with the default. The default looks like it was copied from code like 10 * 1024 * 1024... but in Spark SQL its 10 * 102 * 1024. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2937) Separate out sampleByKeyExact in PairRDDFunctions as its own API
[ https://issues.apache.org/jira/browse/SPARK-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2937. -- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1866 [https://github.com/apache/spark/pull/1866] Separate out sampleByKeyExact in PairRDDFunctions as its own API Key: SPARK-2937 URL: https://issues.apache.org/jira/browse/SPARK-2937 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Doris Xin Assignee: Doris Xin Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2956) Support transferring large blocks in Netty network module
[ https://issues.apache.org/jira/browse/SPARK-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2956: --- Summary: Support transferring large blocks in Netty network module (was: Support transferring blocks larger than MTU size) Support transferring large blocks in Netty network module - Key: SPARK-2956 URL: https://issues.apache.org/jira/browse/SPARK-2956 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical The existing Netty shuffle implementation does not support large blocks. The culprit is in FileClientHandler.channelRead0(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2956) Support transferring blocks larger than MTU size
Reynold Xin created SPARK-2956: -- Summary: Support transferring blocks larger than MTU size Key: SPARK-2956 URL: https://issues.apache.org/jira/browse/SPARK-2956 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical The existing Netty shuffle implementation does not support large blocks. The culprit is in FileClientHandler.channelRead0(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2957) Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo
[ https://issues.apache.org/jira/browse/SPARK-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092290#comment-14092290 ] Reynold Xin commented on SPARK-2957: cc [~tlipcon] [~t...@lipcon.org] will probably bug you when we work on this. Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo -- Key: SPARK-2957 URL: https://issues.apache.org/jira/browse/SPARK-2957 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2468) Netty-based shuffle network module
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2468: --- Summary: Netty-based shuffle network module (was: Netty based network communication) Netty-based shuffle network module -- Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty. Spark already has a Netty based network module implemented (org.apache.spark.network.netty). However, it lacks some functionality and is turned off by default. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2468) Netty based network communication
[ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2468: --- Summary: Netty based network communication (was: zero-copy shuffle network communication) Netty based network communication - Key: SPARK-2468 URL: https://issues.apache.org/jira/browse/SPARK-2468 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Right now shuffle send goes through the block manager. This is inefficient because it requires loading a block from disk into a kernel buffer, then into a user space buffer, and then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the data and context switching between kernel/user. It also creates unnecessary buffer in the JVM that increases GC Instead, we should use FileChannel.transferTo, which handles this in the kernel space with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/ One potential solution is to use Netty. Spark already has a Netty based network module implemented (org.apache.spark.network.netty). However, it lacks some functionality and is turned off by default. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2957) Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo
Reynold Xin created SPARK-2957: -- Summary: Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo Key: SPARK-2957 URL: https://issues.apache.org/jira/browse/SPARK-2957 Project: Spark Issue Type: Sub-task Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2956) Support transferring large blocks in Netty network module
[ https://issues.apache.org/jira/browse/SPARK-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2956: --- Description: The existing Netty shuffle implementation does not support large blocks. The culprit is in FileClientHandler.channelRead0(). We should add a LengthFieldBasedFrameDecoder to the pipeline. was: The existing Netty shuffle implementation does not support large blocks. The culprit is in FileClientHandler.channelRead0(). Support transferring large blocks in Netty network module - Key: SPARK-2956 URL: https://issues.apache.org/jira/browse/SPARK-2956 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical The existing Netty shuffle implementation does not support large blocks. The culprit is in FileClientHandler.channelRead0(). We should add a LengthFieldBasedFrameDecoder to the pipeline. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2957) Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo
[ https://issues.apache.org/jira/browse/SPARK-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092294#comment-14092294 ] Todd Lipcon commented on SPARK-2957: Sure, happy to help Leverage Hadoop native io's fadvise and read-ahead in Netty transferTo -- Key: SPARK-2957 URL: https://issues.apache.org/jira/browse/SPARK-2957 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2958) FileClientHandler should not be shared in the pipeline
Reynold Xin created SPARK-2958: -- Summary: FileClientHandler should not be shared in the pipeline Key: SPARK-2958 URL: https://issues.apache.org/jira/browse/SPARK-2958 Project: Spark Issue Type: Bug Reporter: Reynold Xin Netty module creates a single FileClientHandler and shares it in all threads. We should create a new one for each pipeline thread. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2959) Use a single FileClient and Netty client thread pool
Reynold Xin created SPARK-2959: -- Summary: Use a single FileClient and Netty client thread pool Key: SPARK-2959 URL: https://issues.apache.org/jira/browse/SPARK-2959 Project: Spark Issue Type: Improvement Reporter: Reynold Xin The current implementation creates a new Netty bootstrap for fetching each block. This is pretty crazy! We should reuse the bootstrap FileClient. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2677) BasicBlockFetchIterator#next can wait forever
[ https://issues.apache.org/jira/browse/SPARK-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092355#comment-14092355 ] Kousuke Saruta commented on SPARK-2677: --- SPARK-2538 was resolved but there is still this issue. I tried to resolve this issue in https://github.com/apache/spark/pull/1632 BasicBlockFetchIterator#next can wait forever - Key: SPARK-2677 URL: https://issues.apache.org/jira/browse/SPARK-2677 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.2, 1.0.0, 1.0.1 Reporter: Kousuke Saruta Assignee: Josh Rosen Priority: Blocker In BasicBlockFetchIterator#next, it waits fetch result on result.take. {code} override def next(): (BlockId, Option[Iterator[Any]]) = { resultsGotten += 1 val startFetchWait = System.currentTimeMillis() val result = results.take() val stopFetchWait = System.currentTimeMillis() _fetchWaitTime += (stopFetchWait - startFetchWait) if (! result.failed) bytesInFlight -= result.size while (!fetchRequests.isEmpty (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size = maxBytesInFlight)) { sendRequest(fetchRequests.dequeue()) } (result.blockId, if (result.failed) None else Some(result.deserialize())) } {code} But, results is implemented as LinkedBlockingQueue so if remote executor hang up, fetching Executor waits forever. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2960) Spark executables fail to start via symlinks
[ https://issues.apache.org/jira/browse/SPARK-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Rojansky updated SPARK-2960: - Priority: Minor (was: Major) Spark executables fail to start via symlinks Key: SPARK-2960 URL: https://issues.apache.org/jira/browse/SPARK-2960 Project: Spark Issue Type: Bug Reporter: Shay Rojansky Priority: Minor Fix For: 1.0.2 The current scripts (e.g. pyspark) fail to run when they are executed via symlinks. A common Linux scenario would be to have Spark installed somewhere (e.g. /opt) and have a symlink to it in /usr/bin. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2960) Spark executables fail to start via symlinks
[ https://issues.apache.org/jira/browse/SPARK-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Rojansky updated SPARK-2960: - Summary: Spark executables fail to start via symlinks (was: Spark executables failed to start via symlinks) Spark executables fail to start via symlinks Key: SPARK-2960 URL: https://issues.apache.org/jira/browse/SPARK-2960 Project: Spark Issue Type: Bug Reporter: Shay Rojansky Fix For: 1.0.2 The current scripts (e.g. pyspark) fail to run when they are executed via symlinks. A common Linux scenario would be to have Spark installed somewhere (e.g. /opt) and have a symlink to it in /usr/bin. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2960) Spark executables failed to start via symlinks
Shay Rojansky created SPARK-2960: Summary: Spark executables failed to start via symlinks Key: SPARK-2960 URL: https://issues.apache.org/jira/browse/SPARK-2960 Project: Spark Issue Type: Bug Reporter: Shay Rojansky Fix For: 1.0.2 The current scripts (e.g. pyspark) fail to run when they are executed via symlinks. A common Linux scenario would be to have Spark installed somewhere (e.g. /opt) and have a symlink to it in /usr/bin. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2961) Use statistics to skip partitions when reading from in-memory columnar data
Michael Armbrust created SPARK-2961: --- Summary: Use statistics to skip partitions when reading from in-memory columnar data Key: SPARK-2961 URL: https://issues.apache.org/jira/browse/SPARK-2961 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2961) Use statistics to skip partitions when reading from in-memory columnar data
[ https://issues.apache.org/jira/browse/SPARK-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2961: Target Version/s: 1.1.0 Use statistics to skip partitions when reading from in-memory columnar data --- Key: SPARK-2961 URL: https://issues.apache.org/jira/browse/SPARK-2961 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2961) Use statistics to skip partitions when reading from in-memory columnar data
[ https://issues.apache.org/jira/browse/SPARK-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092373#comment-14092373 ] Apache Spark commented on SPARK-2961: - User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/1883 Use statistics to skip partitions when reading from in-memory columnar data --- Key: SPARK-2961 URL: https://issues.apache.org/jira/browse/SPARK-2961 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2062) VertexRDD.apply does not use the mergeFunc
[ https://issues.apache.org/jira/browse/SPARK-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092381#comment-14092381 ] Larry Xiao commented on SPARK-2062: --- Is anyone working on it? I want to take it. My plan is to add a pass to do the merge, is it ok? [~ankurd] VertexRDD.apply does not use the mergeFunc -- Key: SPARK-2062 URL: https://issues.apache.org/jira/browse/SPARK-2062 Project: Spark Issue Type: Bug Components: GraphX Reporter: Ankur Dave Assignee: Ankur Dave Here: https://github.com/apache/spark/blob/b1feb60209174433262de2a26d39616ba00edcc8/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala#L410 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2936) Migrate Netty network module from Java to Scala
[ https://issues.apache.org/jira/browse/SPARK-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson resolved SPARK-2936. --- Resolution: Fixed Migrate Netty network module from Java to Scala --- Key: SPARK-2936 URL: https://issues.apache.org/jira/browse/SPARK-2936 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Affects Versions: 1.1.0 Reporter: Reynold Xin Assignee: Reynold Xin The netty network module was originally written when Scala 2.9.x had a bug that prevents a pure Scala implementation, and a subset of the files were done in Java. We have since upgraded to Scala 2.10, and can migrate all Java files now to Scala. https://github.com/netty/netty/issues/781 https://github.com/mesos/spark/pull/522 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2962) Suboptimal scheduling in spark
Mridul Muralidharan created SPARK-2962: -- Summary: Suboptimal scheduling in spark Key: SPARK-2962 URL: https://issues.apache.org/jira/browse/SPARK-2962 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: All Reporter: Mridul Muralidharan In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are always scheduled with PROCESS_LOCAL pendingTasksWithNoPrefs contains tasks which currently do not have any alive locations - but which could come in 'later' : particularly relevant when spark app is just coming up and containers are still being added. This causes a large number of non node local tasks to be scheduled incurring significant network transfers in the cluster when running with non trivial datasets. The comment // Look for no-pref tasks after rack-local tasks since they can run anywhere. is misleading in the method code : locality levels start from process_local down to any, and so no prefs get scheduled much before rack. Also note that, currentLocalityIndex is reset to the taskLocality returned by this method - so returning PROCESS_LOCAL as the level will trigger wait times again. (Was relevant before recent change to scheduler, and might be again based on resolution of this issue). Found as part of writing test for SPARK-2931 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark
[ https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092418#comment-14092418 ] Matei Zaharia commented on SPARK-2962: -- I thought this was fixed in https://github.com/apache/spark/pull/1313. Is that not the case? Suboptimal scheduling in spark -- Key: SPARK-2962 URL: https://issues.apache.org/jira/browse/SPARK-2962 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: All Reporter: Mridul Muralidharan In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are always scheduled with PROCESS_LOCAL pendingTasksWithNoPrefs contains tasks which currently do not have any alive locations - but which could come in 'later' : particularly relevant when spark app is just coming up and containers are still being added. This causes a large number of non node local tasks to be scheduled incurring significant network transfers in the cluster when running with non trivial datasets. The comment // Look for no-pref tasks after rack-local tasks since they can run anywhere. is misleading in the method code : locality levels start from process_local down to any, and so no prefs get scheduled much before rack. Also note that, currentLocalityIndex is reset to the taskLocality returned by this method - so returning PROCESS_LOCAL as the level will trigger wait times again. (Was relevant before recent change to scheduler, and might be again based on resolution of this issue). Found as part of writing test for SPARK-2931 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark
[ https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092427#comment-14092427 ] Mridul Muralidharan commented on SPARK-2962: To give more context; a) Our jobs start with load data from dfs as starting point : and so this is the first stage that gets executed. b) We are sleeping for 1 minute before starting the jobs (in case cluster is busy, etc) - unfortunately, this is not sufficient and iirc there is no programmatic way to wait more deterministically for X% of node (was something added to alleviate this ? I did see some discussion) c) This becomes more of a problem because spark does not honour preferred location anymore while running in yarn. See SPARK-208 - due to 1.0 interface changes. [ Practically, if we are using large enough number of nodes (with replication of 3 or higher), usually we do end up with quite of lot of data local tasks eventually - so (c) is not an immediate concern for our current jobs assuming (b) is not an issue, though it is suboptimal in general case ] Suboptimal scheduling in spark -- Key: SPARK-2962 URL: https://issues.apache.org/jira/browse/SPARK-2962 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: All Reporter: Mridul Muralidharan In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are always scheduled with PROCESS_LOCAL pendingTasksWithNoPrefs contains tasks which currently do not have any alive locations - but which could come in 'later' : particularly relevant when spark app is just coming up and containers are still being added. This causes a large number of non node local tasks to be scheduled incurring significant network transfers in the cluster when running with non trivial datasets. The comment // Look for no-pref tasks after rack-local tasks since they can run anywhere. is misleading in the method code : locality levels start from process_local down to any, and so no prefs get scheduled much before rack. Also note that, currentLocalityIndex is reset to the taskLocality returned by this method - so returning PROCESS_LOCAL as the level will trigger wait times again. (Was relevant before recent change to scheduler, and might be again based on resolution of this issue). Found as part of writing test for SPARK-2931 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark
[ https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092430#comment-14092430 ] Mridul Muralidharan commented on SPARK-2962: Hi [~matei], I am referencing the latest code (as of yday night). pendingTasksWithNoPrefs currnetly contains both tasks which truely have no preference, and tasks which have preference which are unavailble - and the latter is what is triggering this, since that can change during the execution of the stage. Hope I am not missing something ? Thanks, Mridul Suboptimal scheduling in spark -- Key: SPARK-2962 URL: https://issues.apache.org/jira/browse/SPARK-2962 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: All Reporter: Mridul Muralidharan In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are always scheduled with PROCESS_LOCAL pendingTasksWithNoPrefs contains tasks which currently do not have any alive locations - but which could come in 'later' : particularly relevant when spark app is just coming up and containers are still being added. This causes a large number of non node local tasks to be scheduled incurring significant network transfers in the cluster when running with non trivial datasets. The comment // Look for no-pref tasks after rack-local tasks since they can run anywhere. is misleading in the method code : locality levels start from process_local down to any, and so no prefs get scheduled much before rack. Also note that, currentLocalityIndex is reset to the taskLocality returned by this method - so returning PROCESS_LOCAL as the level will trigger wait times again. (Was relevant before recent change to scheduler, and might be again based on resolution of this issue). Found as part of writing test for SPARK-2931 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2962) Suboptimal scheduling in spark
[ https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092431#comment-14092431 ] Mridul Muralidharan commented on SPARK-2962: Note, I dont think this is a regression in 1.1, and probably existed much earlier too. Other issues are making us notice this (like SPARK-2089) - we moved to 1.1 from 0.9 recently. Suboptimal scheduling in spark -- Key: SPARK-2962 URL: https://issues.apache.org/jira/browse/SPARK-2962 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: All Reporter: Mridul Muralidharan In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are always scheduled with PROCESS_LOCAL pendingTasksWithNoPrefs contains tasks which currently do not have any alive locations - but which could come in 'later' : particularly relevant when spark app is just coming up and containers are still being added. This causes a large number of non node local tasks to be scheduled incurring significant network transfers in the cluster when running with non trivial datasets. The comment // Look for no-pref tasks after rack-local tasks since they can run anywhere. is misleading in the method code : locality levels start from process_local down to any, and so no prefs get scheduled much before rack. Also note that, currentLocalityIndex is reset to the taskLocality returned by this method - so returning PROCESS_LOCAL as the level will trigger wait times again. (Was relevant before recent change to scheduler, and might be again based on resolution of this issue). Found as part of writing test for SPARK-2931 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-2962) Suboptimal scheduling in spark
[ https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092427#comment-14092427 ] Mridul Muralidharan edited comment on SPARK-2962 at 8/11/14 4:35 AM: - To give more context; a) Our jobs start with load data from dfs as starting point : and so this is the first stage that gets executed. b) We are sleeping for 1 minute before starting the jobs (in case cluster is busy, etc) - unfortunately, this is not sufficient and iirc there is no programmatic way to wait more deterministically for X% of node (was something added to alleviate this ? I did see some discussion) c) This becomes more of a problem because spark does not honour preferred location anymore while running in yarn. See SPARK-2089 - due to 1.0 interface changes. [ Practically, if we are using large enough number of nodes (with replication of 3 or higher), usually we do end up with quite of lot of data local tasks eventually - so (c) is not an immediate concern for our current jobs assuming (b) is not an issue, though it is suboptimal in general case ] was (Author: mridulm80): To give more context; a) Our jobs start with load data from dfs as starting point : and so this is the first stage that gets executed. b) We are sleeping for 1 minute before starting the jobs (in case cluster is busy, etc) - unfortunately, this is not sufficient and iirc there is no programmatic way to wait more deterministically for X% of node (was something added to alleviate this ? I did see some discussion) c) This becomes more of a problem because spark does not honour preferred location anymore while running in yarn. See SPARK-208 - due to 1.0 interface changes. [ Practically, if we are using large enough number of nodes (with replication of 3 or higher), usually we do end up with quite of lot of data local tasks eventually - so (c) is not an immediate concern for our current jobs assuming (b) is not an issue, though it is suboptimal in general case ] Suboptimal scheduling in spark -- Key: SPARK-2962 URL: https://issues.apache.org/jira/browse/SPARK-2962 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: All Reporter: Mridul Muralidharan In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs are always scheduled with PROCESS_LOCAL pendingTasksWithNoPrefs contains tasks which currently do not have any alive locations - but which could come in 'later' : particularly relevant when spark app is just coming up and containers are still being added. This causes a large number of non node local tasks to be scheduled incurring significant network transfers in the cluster when running with non trivial datasets. The comment // Look for no-pref tasks after rack-local tasks since they can run anywhere. is misleading in the method code : locality levels start from process_local down to any, and so no prefs get scheduled much before rack. Also note that, currentLocalityIndex is reset to the taskLocality returned by this method - so returning PROCESS_LOCAL as the level will trigger wait times again. (Was relevant before recent change to scheduler, and might be again based on resolution of this issue). Found as part of writing test for SPARK-2931 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2912) Jenkins should include the commit hash in his messages
[ https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092434#comment-14092434 ] Michael Yannakopoulos commented on SPARK-2912: -- Hi Nicholas, I can work on this issue! Thanks, Michael Jenkins should include the commit hash in his messages -- Key: SPARK-2912 URL: https://issues.apache.org/jira/browse/SPARK-2912 Project: Spark Issue Type: Sub-task Components: Build Reporter: Nicholas Chammas When there are multiple test cycles within a PR, it is not obvious what cycle applies to what set of changes. This makes it more likely for committers to merge a PR that has had new commits added since the last PR. Requirements: * Add the commit hash to Jenkins's messages so it's clear what the test cycle corresponds to. * While you're at it, polish the formatting a bit. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2912) Jenkins should include the commit hash in his messages
[ https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092435#comment-14092435 ] Patrick Wendell commented on SPARK-2912: Hey Michael - I believe [~nchammas] is already working on it actually, so I assigned him. Jenkins should include the commit hash in his messages -- Key: SPARK-2912 URL: https://issues.apache.org/jira/browse/SPARK-2912 Project: Spark Issue Type: Sub-task Components: Build Reporter: Nicholas Chammas Assignee: Nicholas Chammas When there are multiple test cycles within a PR, it is not obvious what cycle applies to what set of changes. This makes it more likely for committers to merge a PR that has had new commits added since the last PR. Requirements: * Add the commit hash to Jenkins's messages so it's clear what the test cycle corresponds to. * While you're at it, polish the formatting a bit. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2912) Jenkins should include the commit hash in his messages
[ https://issues.apache.org/jira/browse/SPARK-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092437#comment-14092437 ] Michael Yannakopoulos commented on SPARK-2912: -- Thanks for the quick reply Patrick! Nice, I will try to find another open issue so as to resolve it. Jenkins should include the commit hash in his messages -- Key: SPARK-2912 URL: https://issues.apache.org/jira/browse/SPARK-2912 Project: Spark Issue Type: Sub-task Components: Build Reporter: Nicholas Chammas Assignee: Nicholas Chammas When there are multiple test cycles within a PR, it is not obvious what cycle applies to what set of changes. This makes it more likely for committers to merge a PR that has had new commits added since the last PR. Requirements: * Add the commit hash to Jenkins's messages so it's clear what the test cycle corresponds to. * While you're at it, polish the formatting a bit. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2963) There no documentation for building about SparkSQL
Kousuke Saruta created SPARK-2963: - Summary: There no documentation for building about SparkSQL Key: SPARK-2963 URL: https://issues.apache.org/jira/browse/SPARK-2963 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Kousuke Saruta Currently, if we'd like to use SparkSQL, we need to use -Phive-thriftserver option on building but it's implicit. I think we need to describe how to build. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org