[jira] [Created] (SPARK-21894) Some Netty errors do not propagate to the top level driver
Charles Allen created SPARK-21894: - Summary: Some Netty errors do not propagate to the top level driver Key: SPARK-21894 URL: https://issues.apache.org/jira/browse/SPARK-21894 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: Charles Allen We have an environment with Netty 4.1 ( https://issues.apache.org/jira/browse/SPARK-19552 for some context) and the following error occurs. The reason THIS issue is being filed is because this error leaves the Spark workload in a bad state where it does not make any progress, and does not shut down. The expected behavior is that the spark job would throw an exception that can be caught by the driving application. {code} 017-09-01T16:13:32,175 ERROR [shuffle-server-3-2] org.apache.spark.network.server.TransportRequestHandler - Error sending result StreamResponse{streamId=/jars/lz4-1.3.0.jar, byteCount=236880, body=FileSegmentManagedBuffer{file=/Users/charlesallen/.m2/repository/net/jpountz/lz4/lz4/1.3.0/lz4-1.3.0.jar, offset=0, length=236880}} to /192.168.59.3:56703; closing connection java.lang.AbstractMethodError at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:810) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:111) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:305) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:801) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1032) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:296) ~[netty-all-4.1.11.Final.jar:4.1.11.Final] at org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194) [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] at org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:150) [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111) [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119) [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.1.11.Final.jar:4.1.11.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-all-4.1.11.Final.jar:4.1.11.Final] at
[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.8 final
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135819#comment-16135819 ] Charles Allen commented on SPARK-19552: --- [~aash] Do you have a link to an Apache Arrow issue on this? > Upgrade Netty version to 4.1.8 final > > > Key: SPARK-19552 > URL: https://issues.apache.org/jira/browse/SPARK-19552 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.1.0 >Reporter: Adam Roberts >Priority: Minor > > Netty 4.1.8 was recently released but isn't API compatible with previous > major versions (like Netty 4.0.x), see > http://netty.io/news/2017/01/30/4-0-44-Final-4-1-8-Final.html for details. > This version does include a fix for a security concern but not one we'd be > exposed to with Spark "out of the box". Let's upgrade the version we use to > be on the safe side as the security fix I'm especially interested in is not > available in the 4.0.x release line. > We should move up anyway to take on a bunch of other big fixes cited in the > release notes (and if anyone were to use Spark with netty and tcnative, they > shouldn't be exposed to the security problem) - we should be good citizens > and make this change. > As this 4.1 version involves API changes we'll need to implement a few > methods and possibly adjust the Sasl tests. This JIRA and associated pull > request starts the process which I'll work on - and any help would be much > appreciated! Currently I know: > {code} > @Override > public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise > promise) > throws Exception { > if (!foundEncryptionHandler) { > foundEncryptionHandler = > ctx.channel().pipeline().get(encryptHandlerName) != null; <-- this > returns false and causes test failures > } > ctx.write(msg, promise); > } > {code} > Here's what changes will be required (at least): > {code} > common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java{code} > requires touch, retain and transferred methods > {code} > common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java{code} > requires the above methods too > {code}common/network-common/src/test/java/org/apache/spark/network/protocol/MessageWithHeaderSuite.java{code} > With "dummy" implementations so we can at least compile and test, we'll see > five new test failures to address. > These are > {code} > org.apache.spark.network.sasl.SparkSaslSuite.testFileRegionEncryption > org.apache.spark.network.sasl.SparkSaslSuite.testSaslEncryption > org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testEncryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.send with SASL encryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.ask with SASL encryption > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056601#comment-16056601 ] Charles Allen commented on SPARK-19111: --- [~ste...@apache.org] It makes it unstable because the history server chokes on such large files. The task finishes though. We didn't debug what about the large files is making the history server choke, but if I recall the history files were on the order of tens of GB, so it wouldn't shock me if the history server didn't handle them efficiently. > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO
[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.8 final
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044925#comment-16044925 ] Charles Allen commented on SPARK-19552: --- This is starting to show problems on our side due to library issues https://github.com/druid-io/druid/issues/4390 > Upgrade Netty version to 4.1.8 final > > > Key: SPARK-19552 > URL: https://issues.apache.org/jira/browse/SPARK-19552 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.1.0 >Reporter: Adam Roberts >Priority: Minor > > Netty 4.1.8 was recently released but isn't API compatible with previous > major versions (like Netty 4.0.x), see > http://netty.io/news/2017/01/30/4-0-44-Final-4-1-8-Final.html for details. > This version does include a fix for a security concern but not one we'd be > exposed to with Spark "out of the box". Let's upgrade the version we use to > be on the safe side as the security fix I'm especially interested in is not > available in the 4.0.x release line. > We should move up anyway to take on a bunch of other big fixes cited in the > release notes (and if anyone were to use Spark with netty and tcnative, they > shouldn't be exposed to the security problem) - we should be good citizens > and make this change. > As this 4.1 version involves API changes we'll need to implement a few > methods and possibly adjust the Sasl tests. This JIRA and associated pull > request starts the process which I'll work on - and any help would be much > appreciated! Currently I know: > {code} > @Override > public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise > promise) > throws Exception { > if (!foundEncryptionHandler) { > foundEncryptionHandler = > ctx.channel().pipeline().get(encryptHandlerName) != null; <-- this > returns false and causes test failures > } > ctx.write(msg, promise); > } > {code} > Here's what changes will be required (at least): > {code} > common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java{code} > requires touch, retain and transferred methods > {code} > common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java{code} > requires the above methods too > {code}common/network-common/src/test/java/org/apache/spark/network/protocol/MessageWithHeaderSuite.java{code} > With "dummy" implementations so we can at least compile and test, we'll see > five new test failures to address. > These are > {code} > org.apache.spark.network.sasl.SparkSaslSuite.testFileRegionEncryption > org.apache.spark.network.sasl.SparkSaslSuite.testSaslEncryption > org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testEncryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.send with SASL encryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.ask with SASL encryption > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954212#comment-15954212 ] Charles Allen commented on SPARK-4899: -- It was discussed on the mailing list with [~timchen] that checkpointing might just need a timeout setting available to the other schedulers. > Support Mesos features: roles and checkpoints > - > > Key: SPARK-4899 > URL: https://issues.apache.org/jira/browse/SPARK-4899 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Andrew Ash > > Inspired by https://github.com/apache/spark/pull/60 > Mesos has two features that would be nice for Spark to take advantage of: > 1. Roles -- a way to specify ACLs and priorities for users > 2. Checkpoints -- a way to restart a failed Mesos slave without losing all > the work that was happening on the box > Some of these may require a Mesos upgrade past our current 0.18.1 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954170#comment-15954170 ] Charles Allen commented on SPARK-4899: -- {{org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils#createSchedulerDriver}} seems to allow checkpointing, which only {{org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler}} uses. Neither the fine grained nor coarse grained schedulers use it, is there a reason for that? > Support Mesos features: roles and checkpoints > - > > Key: SPARK-4899 > URL: https://issues.apache.org/jira/browse/SPARK-4899 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Andrew Ash > > Inspired by https://github.com/apache/spark/pull/60 > Mesos has two features that would be nice for Spark to take advantage of: > 1. Roles -- a way to specify ACLs and priorities for users > 2. Checkpoints -- a way to restart a failed Mesos slave without losing all > the work that was happening on the box > Some of these may require a Mesos upgrade past our current 0.18.1 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes
[ https://issues.apache.org/jira/browse/SPARK-19698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883331#comment-15883331 ] Charles Allen commented on SPARK-19698: --- [~mridulm80] is there documentation somewhere that describes the output commit best practices? I see a bunch of things that seem to have either the hadoop MR output committer, or some Spark specific output committing stuff, but it is not clear when each should be used. > Race condition in stale attempt task completion vs current attempt task > completion when task is doing persistent state changes > -- > > Key: SPARK-19698 > URL: https://issues.apache.org/jira/browse/SPARK-19698 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > We have encountered a strange scenario in our production environment. Below > is the best guess we have right now as to what's going on. > Potentially, the final stage of a job has a failure in one of the tasks (such > as OOME on the executor) which can cause tasks for that stage to be > relaunched in a second attempt. > https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155 > keeps track of which tasks have been completed, but does NOT keep track of > which attempt those tasks were completed in. As such, we have encountered a > scenario where a particular task gets executed twice in different stage > attempts, and the DAGScheduler does not consider if the second attempt is > still running. This means if the first task attempt succeeded, the second > attempt can be cancelled part-way through its run cycle if all other tasks > (including the prior failed) are completed successfully. > What this means is that if a task is manipulating some state somewhere (for > example: a upload-to-temporary-file-location, then delete-then-move on an > underlying s3n storage implementation) the driver can improperly shutdown the > running (2nd attempt) task between state manipulations, leaving the > persistent state in a bad state since the 2nd attempt never got to complete > its manipulations, and was terminated prematurely at some arbitrary point in > its state change logic (ex: finished the delete but not the move). > This is using the mesos coarse grained executor. It is unclear if this > behavior is limited to the mesos coarse grained executor or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes
[ https://issues.apache.org/jira/browse/SPARK-19698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878959#comment-15878959 ] Charles Allen edited comment on SPARK-19698 at 2/22/17 6:46 PM: I *think* this is due to the driver not having the concept of a "critical section" for code being executed, meaning that you can't declare a portion of the code being run as "I'm in a non-atomic or critical command region, please let me finish" was (Author: drcrallen): I *think* this is due to the driver not having the concept of a "critical section" for code being executed, meaning that you can't declare a portion of the code being run as "I'm in a non-idempotent command region, please let me finish" > Race condition in stale attempt task completion vs current attempt task > completion when task is doing persistent state changes > -- > > Key: SPARK-19698 > URL: https://issues.apache.org/jira/browse/SPARK-19698 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > We have encountered a strange scenario in our production environment. Below > is the best guess we have right now as to what's going on. > Potentially, the final stage of a job has a failure in one of the tasks (such > as OOME on the executor) which can cause tasks for that stage to be > relaunched in a second attempt. > https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155 > keeps track of which tasks have been completed, but does NOT keep track of > which attempt those tasks were completed in. As such, we have encountered a > scenario where a particular task gets executed twice in different stage > attempts, and the DAGScheduler does not consider if the second attempt is > still running. This means if the first task attempt succeeded, the second > attempt can be cancelled part-way through its run cycle if all other tasks > (including the prior failed) are completed successfully. > What this means is that if a task is manipulating some state somewhere (for > example: a upload-to-temporary-file-location, then delete-then-move on an > underlying s3n storage implementation) the driver can improperly shutdown the > running (2nd attempt) task between state manipulations, leaving the > persistent state in a bad state since the 2nd attempt never got to complete > its manipulations, and was terminated prematurely at some arbitrary point in > its state change logic (ex: finished the delete but not the move). > This is using the mesos coarse grained executor. It is unclear if this > behavior is limited to the mesos coarse grained executor or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes
[ https://issues.apache.org/jira/browse/SPARK-19698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878959#comment-15878959 ] Charles Allen commented on SPARK-19698: --- I *think* this is due to the driver not having the concept of a "critical section" for code being executed, meaning that you can't declare a portion of the code being run as "I'm in a non-idempotent command region, please let me finish" > Race condition in stale attempt task completion vs current attempt task > completion when task is doing persistent state changes > -- > > Key: SPARK-19698 > URL: https://issues.apache.org/jira/browse/SPARK-19698 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > We have encountered a strange scenario in our production environment. Below > is the best guess we have right now as to what's going on. > Potentially, the final stage of a job has a failure in one of the tasks (such > as OOME on the executor) which can cause tasks for that stage to be > relaunched in a second attempt. > https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155 > keeps track of which tasks have been completed, but does NOT keep track of > which attempt those tasks were completed in. As such, we have encountered a > scenario where a particular task gets executed twice in different stage > attempts, and the DAGScheduler does not consider if the second attempt is > still running. This means if the first task attempt succeeded, the second > attempt can be cancelled part-way through its run cycle if all other tasks > (including the prior failed) are completed successfully. > What this means is that if a task is manipulating some state somewhere (for > example: a upload-to-temporary-file-location, then delete-then-move on an > underlying s3n storage implementation) the driver can improperly shutdown the > running (2nd attempt) task between state manipulations, leaving the > persistent state in a bad state since the 2nd attempt never got to complete > its manipulations, and was terminated prematurely at some arbitrary point in > its state change logic (ex: finished the delete but not the move). > This is using the mesos coarse grained executor. It is unclear if this > behavior is limited to the mesos coarse grained executor or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes
[ https://issues.apache.org/jira/browse/SPARK-19698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-19698: -- Summary: Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes (was: Race condition in stale attempt task completion vs current attempt task completion) > Race condition in stale attempt task completion vs current attempt task > completion when task is doing persistent state changes > -- > > Key: SPARK-19698 > URL: https://issues.apache.org/jira/browse/SPARK-19698 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > We have encountered a strange scenario in our production environment. Below > is the best guess we have right now as to what's going on. > Potentially, the final stage of a job has a failure in one of the tasks (such > as OOME on the executor) which can cause tasks for that stage to be > relaunched in a second attempt. > https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155 > keeps track of which tasks have been completed, but does NOT keep track of > which attempt those tasks were completed in. As such, we have encountered a > scenario where a particular task gets executed twice in different stage > attempts, and the DAGScheduler does not consider if the second attempt is > still running. This means if the first task attempt succeeded, the second > attempt can be cancelled part-way through its run cycle if all other tasks > (including the prior failed) are completed successfully. > What this means is that if a task is manipulating some state somewhere (for > example: a upload-to-temporary-file-location, then delete-then-move on an > underlying s3n storage implementation) the driver can improperly shutdown the > running (2nd attempt) task between state manipulations, leaving the > persistent state in a bad state since the 2nd attempt never got to complete > its manipulations, and was terminated prematurely at some arbitrary point in > its state change logic (ex: finished the delete but not the move). > This is using the mesos coarse grained executor. It is unclear if this > behavior is limited to the mesos coarse grained executor or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion
[ https://issues.apache.org/jira/browse/SPARK-19698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878927#comment-15878927 ] Charles Allen commented on SPARK-19698: --- [~jisookim0...@gmail.com] has been investigating this on our side. > Race condition in stale attempt task completion vs current attempt task > completion > -- > > Key: SPARK-19698 > URL: https://issues.apache.org/jira/browse/SPARK-19698 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > We have encountered a strange scenario in our production environment. Below > is the best guess we have right now as to what's going on. > Potentially, the final stage of a job has a failure in one of the tasks (such > as OOME on the executor) which can cause tasks for that stage to be > relaunched in a second attempt. > https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155 > keeps track of which tasks have been completed, but does NOT keep track of > which attempt those tasks were completed in. As such, we have encountered a > scenario where a particular task gets executed twice in different stage > attempts, and the DAGScheduler does not consider if the second attempt is > still running. This means if the first task attempt succeeded, the second > attempt can be cancelled part-way through its run cycle if all other tasks > (including the prior failed) are completed successfully. > What this means is that if a task is manipulating some state somewhere (for > example: a upload-to-temporary-file-location, then delete-then-move on an > underlying s3n storage implementation) the driver can improperly shutdown the > running (2nd attempt) task between state manipulations, leaving the > persistent state in a bad state since the 2nd attempt never got to complete > its manipulations, and was terminated prematurely at some arbitrary point in > its state change logic (ex: finished the delete but not the move). > This is using the mesos coarse grained executor. It is unclear if this > behavior is limited to the mesos coarse grained executor or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion
Charles Allen created SPARK-19698: - Summary: Race condition in stale attempt task completion vs current attempt task completion Key: SPARK-19698 URL: https://issues.apache.org/jira/browse/SPARK-19698 Project: Spark Issue Type: Bug Components: Mesos, Spark Core Affects Versions: 2.0.0 Reporter: Charles Allen We have encountered a strange scenario in our production environment. Below is the best guess we have right now as to what's going on. Potentially, the final stage of a job has a failure in one of the tasks (such as OOME on the executor) which can cause tasks for that stage to be relaunched in a second attempt. https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155 keeps track of which tasks have been completed, but does NOT keep track of which attempt those tasks were completed in. As such, we have encountered a scenario where a particular task gets executed twice in different stage attempts, and the DAGScheduler does not consider if the second attempt is still running. This means if the first task attempt succeeded, the second attempt can be cancelled part-way through its run cycle if all other tasks (including the prior failed) are completed successfully. What this means is that if a task is manipulating some state somewhere (for example: a upload-to-temporary-file-location, then delete-then-move on an underlying s3n storage implementation) the driver can improperly shutdown the running (2nd attempt) task between state manipulations, leaving the persistent state in a bad state since the 2nd attempt never got to complete its manipulations, and was terminated prematurely at some arbitrary point in its state change logic (ex: finished the delete but not the move). This is using the mesos coarse grained executor. It is unclear if this behavior is limited to the mesos coarse grained executor or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19479) Spark Mesos artifact split causes spark-core dependency to not pull in mesos impl
[ https://issues.apache.org/jira/browse/SPARK-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855102#comment-15855102 ] Charles Allen commented on SPARK-19479: --- [~mgummelt] that's actually a really good suggestion. Somehow I never got subscribed to the dev list > Spark Mesos artifact split causes spark-core dependency to not pull in mesos > impl > - > > Key: SPARK-19479 > URL: https://issues.apache.org/jira/browse/SPARK-19479 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.1.0 >Reporter: Charles Allen > > https://github.com/apache/spark/pull/14637 ( > https://issues.apache.org/jira/browse/SPARK-16967 ) forked off the mesos impl > into its own artifact, but the release notes do not call this out. This broke > our deployments because we depend on packaging with spark-core, which no > longer had any mesos awareness. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19479) Spark Mesos artifact split causes spark-core dependency to not pull in mesos impl
Charles Allen created SPARK-19479: - Summary: Spark Mesos artifact split causes spark-core dependency to not pull in mesos impl Key: SPARK-19479 URL: https://issues.apache.org/jira/browse/SPARK-19479 Project: Spark Issue Type: Bug Components: Mesos, Spark Core Affects Versions: 2.1.0 Reporter: Charles Allen https://github.com/apache/spark/pull/14637 ( https://issues.apache.org/jira/browse/SPARK-16967 ) forked off the mesos impl into its own artifact, but the release notes do not call this out. This broke our deployments because we depend on packaging with spark-core, which no longer had any mesos awareness. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16333) Excessive Spark history event/json data size (5GB each)
[ https://issues.apache.org/jira/browse/SPARK-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849433#comment-15849433 ] Charles Allen commented on SPARK-16333: --- We put in a fix for this in our local branch by (optionally) disabling a whole bunch of extra metrics that were added recently. > Excessive Spark history event/json data size (5GB each) > --- > > Key: SPARK-16333 > URL: https://issues.apache.org/jira/browse/SPARK-16333 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 > Environment: this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) > and ppc platform (Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server > release 7.2 (Maipo)., Spark2.0.0-preview (May-24, 2016 build) >Reporter: Peter Liu > Labels: performance, spark2.0.0 > > With Spark2.0.0-preview (May-24 build), the history event data (the json > file), that is generated for each Spark application run (see below), can be > as big as 5GB (instead of 14 MB for exactly the same application run and the > same input data of 1TB under Spark1.6.1) > -rwxrwx--- 1 root root 5.3G Jun 30 09:39 app-20160630091959- > -rwxrwx--- 1 root root 5.3G Jun 30 09:56 app-20160630094213- > -rwxrwx--- 1 root root 5.3G Jun 30 10:13 app-20160630095856- > -rwxrwx--- 1 root root 5.3G Jun 30 10:30 app-20160630101556- > The test is done with Sparkbench V2, SQL RDD (see github: > https://github.com/SparkTC/spark-bench) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840419#comment-15840419 ] Charles Allen commented on SPARK-19111: --- While switching to s3a helped the logs upload, it made the spark history server unusable, which is probably another bug. > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload > 2017-01-06T21:59:44,679 INFO [main] >
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840418#comment-15840418 ] Charles Allen commented on SPARK-19111: --- We have a patch https://github.com/apache/spark/pull/16714 for SPARK-16333 which fixes the problem on our side by disabling the verbose new metrics. > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload >
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812304#comment-15812304 ] Charles Allen commented on SPARK-19111: --- That's great information, thank you [~ste...@apache.org] > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload > 2017-01-06T21:59:44,679 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812301#comment-15812301 ] Charles Allen commented on SPARK-19111: --- I was also going to have the folks here look at the closing sequence to see why the spark executor lifecycle wasn't waiting for the close to complete, or not reporting the error. https://issues.apache.org/jira/browse/SPARK-12330 was filed previously as a "exits too fast" bug. > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812286#comment-15812286 ] Charles Allen commented on SPARK-19111: --- If you prefer "re-open on more data" then I can definitely accommodate that. > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload > 2017-01-06T21:59:44,679 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15809837#comment-15809837 ] Charles Allen commented on SPARK-19111: --- So to clarify, I agree this ticket is not actionable as is, but I have not seen this effect recorded in another ticket (is it?), so I propose leaving this open until more information is available either from our side or from others who report a similar problem. > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key >
[jira] [Commented] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807603#comment-15807603 ] Charles Allen commented on SPARK-19111: --- I have not been able to finish root cause stuff, but I know it works for jobs except for our largest spark job. And it consistently fails for that large spark job. > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload
[jira] [Updated] (SPARK-19111) S3 Mesos history upload fails if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-19111: -- Summary: S3 Mesos history upload fails if too large (was: S3 Mesos history upload fails if too large or if distributed datastore is misbehaving) > S3 Mesos history upload fails if too large > -- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload > 2017-01-06T21:59:44,679 INFO [main] >
[jira] [Updated] (SPARK-19111) S3 Mesos history upload fails silently if too large
[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-19111: -- Summary: S3 Mesos history upload fails silently if too large (was: S3 Mesos history upload fails if too large) > S3 Mesos history upload fails silently if too large > --- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload > 2017-01-06T21:59:44,679 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem
[jira] [Created] (SPARK-19111) S3 Mesos history upload fails if too large or if distributed datastore is misbehaving
Charles Allen created SPARK-19111: - Summary: S3 Mesos history upload fails if too large or if distributed datastore is misbehaving Key: SPARK-19111 URL: https://issues.apache.org/jira/browse/SPARK-19111 Project: Spark Issue Type: Bug Components: EC2, Mesos, Spark Core Affects Versions: 2.0.0 Reporter: Charles Allen {code} 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://REDACTED:4041 2017-01-06T21:32:32,938 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.jvmGCTime 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.localBlocksFetched 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.resultSerializationTime 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate( 364,WrappedArray()) 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.resultSize 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.peakExecutionMemory 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.fetchWaitTime 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.memoryBytesSpilled 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.remoteBytesRead 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.diskBytesSpilled 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.localBytesRead 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.recordsRead 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.executorDeserializeTime 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes 2017-01-06T21:32:32,941 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.executorRunTime 2017-01-06T21:32:32,941 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.remoteBlocksFetched 2017-01-06T21:32:32,943 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' closed. Now beginning upload 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) {code} Running spark on mesos, some large jobs fail to upload to the history server storage! A successful sequence of events in the log that yield an upload are as follows: {code} 2017-01-06T19:14:32,925 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' 2017-01-06T21:59:14,789 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' closed. Now beginning upload 2017-01-06T21:59:44,679 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' upload complete {code} But large jobs do not ever get to the {{upload complete}} log message, and instead exit before completion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail:
[jira] [Created] (SPARK-18600) BZ2 CRC read error needs better reporting
Charles Allen created SPARK-18600: - Summary: BZ2 CRC read error needs better reporting Key: SPARK-18600 URL: https://issues.apache.org/jira/browse/SPARK-18600 Project: Spark Issue Type: Bug Components: SQL Reporter: Charles Allen {code} 16/11/25 20:05:03 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. org.apache.spark.SparkException: Job aborted due to stage failure: Task 148 in stage 5.0 failed 1 times, most recent failure: Lost task 148.0 in stage 5.0 (TID 5945, localhost): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.univocity.parsers.common.TextParsingException: java.lang.IllegalStateException - Error reading from input Parser Configuration: CsvParserSettings: Auto configuration enabled=true Autodetect column delimiter=false Autodetect quotes=false Column reordering enabled=true Empty value=null Escape unquoted values=false Header extraction enabled=null Headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, OPR_DT, OPR_HR, NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, XML_DATA_ITEM, PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP] Ignore leading whitespaces=false Ignore trailing whitespaces=false Input buffer size=128 Input reading on separate thread=false Keep escape sequences=false Line separator detection enabled=false Maximum number of characters per column=100 Maximum number of columns=20480 Normalize escaped line separators=true Null value= Number of records to read=all Row processor=none RowProcessor error handler=null Selected fields=none Skip empty lines=true Unescaped quote handling=STOP_AT_DELIMITERFormat configuration: CsvFormat: Comment character=\0 Field delimiter=, Line separator (normalized)=\n Line separator sequence=\n Quote character=" Quote escape character=\ Quote escape escape character=null Internal state when error was thrown: line=27089, column=13, record=27089, charIndex=4451456, headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, OPR_DT, OPR_HR, NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, XML_DATA_ITEM, PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP] at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:302) at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:431) at org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:148) at org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:131) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:253) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) at
[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511767#comment-15511767 ] Charles Allen commented on SPARK-6305: -- Just FYI, as I found out recently, kafka (at least 8.x) requires log4j on the classpath (http://mail-archives.apache.org/mod_mbox/kafka-users/201401.mbox/%3ccaa7ooca0+3sltognxaxwofysedkysfyqt0hs_a6r3jy...@mail.gmail.com%3E for only other reference to this problem I could find). But the slf4j-log4j12 bridge can at least be removed. > Add support for log4j 2.x to Spark > -- > > Key: SPARK-6305 > URL: https://issues.apache.org/jira/browse/SPARK-6305 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Tal Sliwowicz >Priority: Minor > > log4j 2 requires replacing the slf4j binding and adding the log4j jars in the > classpath. Since there are shaded jars, it must be done during the build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13640) Synchronize ScalaReflection.mirror method.
[ https://issues.apache.org/jira/browse/SPARK-13640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15499471#comment-15499471 ] Charles Allen commented on SPARK-13640: --- These failed on first attempt: {code} org.apache.spark.sql.catalyst.ScalaReflectionSuite.SPARK-13640: thread safety of constructorFor org.apache.spark.sql.catalyst.ScalaReflectionSuite.SPARK-13640: thread safety of extractorsFor org.apache.spark.sql.catalyst.ScalaReflectionSuite.SPARK-13640: thread safety of schemaFor {code} Second attempt: {code} org.apache.spark.sql.catalyst.ScalaReflectionSuite.SPARK-13640: thread safety of dataTypeFor org.apache.spark.sql.catalyst.ScalaReflectionSuite.SPARK-13640: thread safety of extractorsFor {code} > Synchronize ScalaReflection.mirror method. > -- > > Key: SPARK-13640 > URL: https://issues.apache.org/jira/browse/SPARK-13640 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 2.0.0 > > > {{ScalaReflection.mirror}} method should be synchronized when scala version > is 2.10 because {{universe.runtimeMirror}} is not thread safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13640) Synchronize ScalaReflection.mirror method.
[ https://issues.apache.org/jira/browse/SPARK-13640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15499457#comment-15499457 ] Charles Allen commented on SPARK-13640: --- I keep having scala 2.10 test failures in tests for this patch. > Synchronize ScalaReflection.mirror method. > -- > > Key: SPARK-13640 > URL: https://issues.apache.org/jira/browse/SPARK-13640 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 2.0.0 > > > {{ScalaReflection.mirror}} method should be synchronized when scala version > is 2.10 because {{universe.runtimeMirror}} is not thread safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions
[ https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421478#comment-15421478 ] Charles Allen commented on SPARK-11714: --- Awesome! Thanks guys! > Make Spark on Mesos honor port restrictions > --- > > Key: SPARK-11714 > URL: https://issues.apache.org/jira/browse/SPARK-11714 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen >Assignee: Stavros Kontopoulos > Fix For: 2.1.0 > > > Currently the MesosSchedulerBackend does not make any effort to honor "ports" > as a resource offer in Mesos. This ask is to have the ports which the > executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16952) [MESOS] MesosCoarseGrainedSchedulerBackend requires spark.mesos.executor.home even if spark.executor.uri is set
Charles Allen created SPARK-16952: - Summary: [MESOS] MesosCoarseGrainedSchedulerBackend requires spark.mesos.executor.home even if spark.executor.uri is set Key: SPARK-16952 URL: https://issues.apache.org/jira/browse/SPARK-16952 Project: Spark Issue Type: Bug Components: Mesos, Scheduler Affects Versions: 2.0.0, 1.6.1, 1.6.0, 1.5.2 Reporter: Charles Allen Priority: Minor In the Mesos coarse grained scheduler, setting `spark.executor.uri` bypasses the code path which requires `spark.mesos.executor.home` since the uri effectively provides the executor home. But `org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend#createCommand` requires `spark.mesos.executor.home` to be set regardless. Our workaround is to set `spark.mesos.executor.home=/dev/null` when using an executor uri. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412046#comment-15412046 ] Charles Allen commented on SPARK-16798: --- I have a much better automated packaging and deployment system up now which is able to have much stricter guarantees over binary delivery (aka, no possibility for charles to fat-finger something) and this error is not showing up anymore on stock spark. So I consider this closed and a side effect of some oddity in the packaging or unexpected second order effects from patching org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407877#comment-15407877 ] Charles Allen commented on SPARK-16798: --- I reproduced it with stock spark. I'm working on getting a tarball attached to this ticket which reproduces the error reliably. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407877#comment-15407877 ] Charles Allen edited comment on SPARK-16798 at 8/4/16 2:49 PM: --- I reproduced it with stock spark. I'm working on getting a tarball attached to this ticket which reproduces the error reliably using stock spark. was (Author: drcrallen): I reproduced it with stock spark. I'm working on getting a tarball attached to this ticket which reproduces the error reliably. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406967#comment-15406967 ] Charles Allen edited comment on SPARK-16798 at 8/4/16 1:41 AM: --- [~srowen] I manually went in with IntelliJ debugging and can confirm that the driver DOES have a valid positive integer value for numPartitions when in the DRIVER. But when running in the mesos executor or in local[4], the task has a value of 0 consistently. I have been able to reproduce this with TPCH data, so I can share it around if you can point me to someone who can help debug what might have changed. The code snippet below is from RDD.scala, with my own comments added {code:title=RDD.scala} def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty) (implicit ord: Ordering[T] = null) : RDD[T] = withScope { require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.") // Correct on DRIVER if (shuffle) { /** Distributes elements evenly across output partitions, starting from a random partition. */ val distributePartition = (index: Int, items: Iterator[T]) => { var position = (new Random(index)).nextInt(numPartitions) // numPartitions == 0 in TASK items.map { t => // Note that the hash code of the key will just be the key itself. The HashPartitioner // will mod it with the number of total partitions. position = position + 1 (position, t) } } : Iterator[(Int, T)] // include a shuffle step so that our upstream tasks are still distributed new CoalescedRDD( new ShuffledRDD[Int, T, T](mapPartitionsWithIndex(distributePartition), new HashPartitioner(numPartitions)), numPartitions, partitionCoalescer).values } else { new CoalescedRDD(this, numPartitions, partitionCoalescer) } } {code} was (Author: drcrallen): [~srowen] I manually went in with IntelliJ debugging and can confirm that the driver DOES have a valid positive integer value for numPartitions when in the DRIVER. But when running in the mesos executor or in local[4], the task has a value of 0 consistently. I have been able to reproduce this with TPCH data, so I can share it around if you can point me to someone who can help debug what might have changed. The code snippet below is from RDD.scala, with my own comments addedl {code:title=RDD.scala} def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty) (implicit ord: Ordering[T] = null) : RDD[T] = withScope { require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.") // Correct on DRIVER if (shuffle) { /** Distributes elements evenly across output partitions, starting from a random partition. */ val distributePartition = (index: Int, items: Iterator[T]) => { var position = (new Random(index)).nextInt(numPartitions) // numPartitions == 0 in TASK items.map { t => // Note that the hash code of the key will just be the key itself. The HashPartitioner // will mod it with the number of total partitions. position = position + 1 (position, t) } } : Iterator[(Int, T)] // include a shuffle step so that our upstream tasks are still distributed new CoalescedRDD( new ShuffledRDD[Int, T, T](mapPartitionsWithIndex(distributePartition), new HashPartitioner(numPartitions)), numPartitions, partitionCoalescer).values } else { new CoalescedRDD(this, numPartitions, partitionCoalescer) } } {code} > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at >
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406967#comment-15406967 ] Charles Allen commented on SPARK-16798: --- [~srowen] I manually went in with IntelliJ debugging and can confirm that the driver DOES have a valid positive integer value for numPartitions when in the DRIVER. But when running in the mesos executor or in local[4], the task has a value of 0 consistently. I have been able to reproduce this with TPCH data, so I can share it around if you can point me to someone who can help debug what might have changed. The code snippet below is from RDD.scala, with my own comments addedl {code:title=RDD.scala} def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty) (implicit ord: Ordering[T] = null) : RDD[T] = withScope { require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.") // Correct on DRIVER if (shuffle) { /** Distributes elements evenly across output partitions, starting from a random partition. */ val distributePartition = (index: Int, items: Iterator[T]) => { var position = (new Random(index)).nextInt(numPartitions) // numPartitions == 0 in TASK items.map { t => // Note that the hash code of the key will just be the key itself. The HashPartitioner // will mod it with the number of total partitions. position = position + 1 (position, t) } } : Iterator[(Int, T)] // include a shuffle step so that our upstream tasks are still distributed new CoalescedRDD( new ShuffledRDD[Int, T, T](mapPartitionsWithIndex(distributePartition), new HashPartitioner(numPartitions)), numPartitions, partitionCoalescer).values } else { new CoalescedRDD(this, numPartitions, partitionCoalescer) } } {code} > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403116#comment-15403116 ] Charles Allen commented on SPARK-16798: --- Yep, still happens: {code} 16/08/02 00:41:17 INFO HadoopRDD: Input split: REDACTED.gz:0+7389144 16/08/02 00:41:17 INFO TorrentBroadcast: Started reading broadcast variable 0 16/08/02 00:41:17 INFO TransportClientFactory: Successfully created connection to /<> after 1 ms (0 ms spent in bootstraps) 16/08/02 00:41:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.2 KB, free 3.6 GB) 16/08/02 00:41:17 INFO TorrentBroadcast: Reading broadcast variable 0 took 34 ms 16/08/02 00:41:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 209.2 KB, free 3.6 GB) 16/08/02 00:41:18 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 16/08/02 00:41:18 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 16/08/02 00:41:18 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 16/08/02 00:41:18 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 16/08/02 00:41:18 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 16/08/02 00:41:18 INFO NativeS3FileSystem: Opening 'REDACTED.gz' for reading 16/08/02 00:41:18 INFO CodecPool: Got brand-new decompressor [.gz] 16/08/02 00:41:19 ERROR Executor: Exception in task 11.0 in stage 0.0 (TID 11) java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional
[jira] [Comment Edited] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402676#comment-15402676 ] Charles Allen edited comment on SPARK-16798 at 8/1/16 7:30 PM: --- Minor update. Due to library collisions I have to change around how some of the tagging works internally. I'm cutting an internal-only (MMX) release of https://github.com/metamx/spark/commit/13650fc58e1fcf2cf2a26ba11c819185ae1acc1f with a new tag/version to prevent potential version conflicts in our infrastructure. Didn't want to mess with it over the weekend so new build is making its way through now. was (Author: drcrallen): Minor update. Due to library collisions I have to change around how some of the tagging works internally. I'm cutting an internal-only release of https://github.com/metamx/spark/commit/13650fc58e1fcf2cf2a26ba11c819185ae1acc1f with a new tag/version to prevent potential version conflicts in our infrastructure. Didn't want to mess with it over the weekend so new build is making its way through now. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402676#comment-15402676 ] Charles Allen commented on SPARK-16798: --- Minor update. Due to library collisions I have to change around how some of the tagging works internally. I'm cutting an internal-only release of https://github.com/metamx/spark/commit/13650fc58e1fcf2cf2a26ba11c819185ae1acc1f with a new tag/version to prevent potential version conflicts in our infrastructure. Didn't want to mess with it over the weekend so new build is making its way through now. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400089#comment-15400089 ] Charles Allen edited comment on SPARK-16798 at 7/29/16 10:06 PM: - Adding some more flavor, this is running in Mesos coarse mode against 0.28.2. If I take a subset of the data that failed and run it locally (local[4] or local[1]), it succeeds, which is annoying. here are the info logs from the failing tasks: {code} 16/07/29 18:19:20 INFO HadoopRDD: Input split: REDACTED1:0+163064 16/07/29 18:19:20 INFO TorrentBroadcast: Started reading broadcast variable 0 16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.2 KB, free 3.6 GB) 16/07/29 18:19:20 INFO TorrentBroadcast: Reading broadcast variable 0 took 10 ms 16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 209.2 KB, free 3.6 GB) 16/07/29 18:19:20 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 16/07/29 18:19:20 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 16/07/29 18:19:20 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 16/07/29 18:19:20 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 16/07/29 18:19:20 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED1' for reading 16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz] 16/07/29 18:19:21 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9) java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 14 16/07/29 18:19:21 INFO Executor: Running task 14.0 in stage 0.0 (TID 14) 16/07/29 18:19:21 INFO HadoopRDD: Input split: REDACTED2:0+157816 16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED2' for reading 16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz] 16/07/29 18:19:21 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14) java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 15 16/07/29 18:19:21 INFO Executor: Running task 9.1 in stage 0.0 (TID 15) {code} was (Author: drcrallen): Adding some more flavor, this is running in Mesos coarse mode against
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400089#comment-15400089 ] Charles Allen commented on SPARK-16798: --- Adding some more flavor, this is running in Mesos coarse mode against 0.28.2. If I take a subset of the data that failed and run it locally (local[4] or local[1]), it succeeds, which is annoying. here are the info logs from the failing tasks: {code} 16/07/29 18:19:20 INFO HadoopRDD: Input split: REDACTED1.gz:0+163064 16/07/29 18:19:20 INFO TorrentBroadcast: Started reading broadcast variable 0 16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.2 KB, free 3.6 GB) 16/07/29 18:19:20 INFO TorrentBroadcast: Reading broadcast variable 0 took 10 ms 16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 209.2 KB, free 3.6 GB) 16/07/29 18:19:20 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 16/07/29 18:19:20 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 16/07/29 18:19:20 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 16/07/29 18:19:20 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 16/07/29 18:19:20 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED1' for reading 16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz] 16/07/29 18:19:21 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9) java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 14 16/07/29 18:19:21 INFO Executor: Running task 14.0 in stage 0.0 (TID 14) 16/07/29 18:19:21 INFO HadoopRDD: Input split: REDACTED2:0+157816 16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED2' for reading 16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz] 16/07/29 18:19:21 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14) java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 15 16/07/29 18:19:21 INFO Executor: Running task 9.1 in stage 0.0 (TID 15) {code} > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 >
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400070#comment-15400070 ] Charles Allen commented on SPARK-16798: --- I am definitely running a *modified* 2.0.0, but modifications are in the scheduler, not the RDD paths. Right now I'm running 1.5.2_2.11 through the deployment system to get as close to apples-to-apples as I can (and so that workflows can be swapped between the two ad-hoc) > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400071#comment-15400071 ] Charles Allen commented on SPARK-16798: --- I'll run 2.0.0 stock as another test that will go out during this push. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400014#comment-15400014 ] Charles Allen commented on SPARK-16798: --- The super odd thing here is that RDD.scala:445 *SHOULD* be protected by the check introduced in https://github.com/apache/spark/pull/13282 , but for some reason it does not seem to be. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399684#comment-15399684 ] Charles Allen commented on SPARK-16798: --- I guess it doesn't need to be open, it can be closed, but I'll see if I can get better testing around it regardless. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399682#comment-15399682 ] Charles Allen commented on SPARK-16798: --- [~srowen] sorry about that, fixed priority. It's a blocker from my side and I'll get it reproducible for here. If its ok I'd like to keep this ticket open for a few days while I get a reproducible test to show the behavior. > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
[ https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-16798: -- Priority: Major (was: Blocker) > java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 > > > Key: SPARK-16798 > URL: https://issues.apache.org/jira/browse/SPARK-16798 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Code at https://github.com/metamx/druid-spark-batch which was working under > 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. > {code} > java.lang.IllegalArgumentException: bound must be positive > at java.util.Random.nextInt(Random.java:388) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) > at > org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
Charles Allen created SPARK-16798: - Summary: java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2 Key: SPARK-16798 URL: https://issues.apache.org/jira/browse/SPARK-16798 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.0 Reporter: Charles Allen Priority: Blocker Code at https://github.com/metamx/druid-spark-batch which was working under 1.5.2 has ceased to function under 2.0.0 with the below stacktrace. {code} java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445) at org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365117#comment-15365117 ] Charles Allen commented on SPARK-16379: --- That's great, thanks a ton! > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Assignee: Sean Owen >Priority: Blocker > Fix For: 2.0.0 > > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365109#comment-15365109 ] Charles Allen commented on SPARK-16379: --- [~srowen] is there a list of blockers somewhere? I also want to get branch-2.0 tested from our side but would like to know what sort of caveats to expect. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Assignee: Sean Owen >Priority: Blocker > Fix For: 2.0.0 > > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6028) Provide an alternative RPC implementation based on the network transport module
[ https://issues.apache.org/jira/browse/SPARK-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364762#comment-15364762 ] Charles Allen commented on SPARK-6028: -- ClassLoader problem on my side. Loader was pulling in 1.5.2 classes for the driver but 1.6.1 classes in the tasks. Ideally the default behavior would not have changed, the tasks would have launched, then class version conflicts would have given logging, rather than having a uri naming conflict. > Provide an alternative RPC implementation based on the network transport > module > --- > > Key: SPARK-6028 > URL: https://issues.apache.org/jira/browse/SPARK-6028 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 1.6.0 > > > Network transport module implements a low level RPC interface. We can build a > new RPC implementation on top of that to replace Akka's. > Design document: > https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6028) Provide an alternative RPC implementation based on the network transport module
[ https://issues.apache.org/jira/browse/SPARK-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364757#comment-15364757 ] Charles Allen commented on SPARK-6028: -- Was semi-related. The patch changed the default from akka to netty, and I had improper classloader in my app which was loading in the 1.5.2 classes instead of the 1.6.1 classes. > Provide an alternative RPC implementation based on the network transport > module > --- > > Key: SPARK-6028 > URL: https://issues.apache.org/jira/browse/SPARK-6028 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 1.6.0 > > > Network transport module implements a low level RPC interface. We can build a > new RPC implementation on top of that to replace Akka's. > Design document: > https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6028) Provide an alternative RPC implementation based on the network transport module
[ https://issues.apache.org/jira/browse/SPARK-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364589#comment-15364589 ] Charles Allen commented on SPARK-6028: -- Not sure if this change is the related reason > Provide an alternative RPC implementation based on the network transport > module > --- > > Key: SPARK-6028 > URL: https://issues.apache.org/jira/browse/SPARK-6028 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 1.6.0 > > > Network transport module implements a low level RPC interface. We can build a > new RPC implementation on top of that to replace Akka's. > Design document: > https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6028) Provide an alternative RPC implementation based on the network transport module
[ https://issues.apache.org/jira/browse/SPARK-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364587#comment-15364587 ] Charles Allen commented on SPARK-6028: -- I just tried out 1.6.1 upgrading from 1.5.2 running spark on mesos. Everything was fine in 1.5.2 None of the mesos backend executors launch anymore due to the following error (reported in failed tasks on the mesos slaves): {code} Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1563) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:157) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:259) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: org.apache.spark.SparkException: Invalid Spark URL: akka.tcp://sparkDriver@HOST_REDACTED:43709/user/CoarseGrainedScheduler at org.apache.spark.rpc.netty.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:62) at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:171) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) ... 4 more {code} > Provide an alternative RPC implementation based on the network transport > module > --- > > Key: SPARK-6028 > URL: https://issues.apache.org/jira/browse/SPARK-6028 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 1.6.0 > > > Network transport module implements a low level RPC interface. We can build a > new RPC implementation on top of that to replace Akka's. > Design document: > https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions
[ https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359430#comment-15359430 ] Charles Allen commented on SPARK-11714: --- Each entry would then be evaluated with {code}.format(port_num){code} > Make Spark on Mesos honor port restrictions > --- > > Key: SPARK-11714 > URL: https://issues.apache.org/jira/browse/SPARK-11714 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen > > Currently the MesosSchedulerBackend does not make any effort to honor "ports" > as a resource offer in Mesos. This ask is to have the ports which the > executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions
[ https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359427#comment-15359427 ] Charles Allen commented on SPARK-11714: --- A real config might look something like {code}-Dspark.blockManager.port=%s,-Dspark.executor.port=%s,-Dspark.shuffle.service.port=%s,-Dcom.sun.management.jmxremote.port=%s{code} > Make Spark on Mesos honor port restrictions > --- > > Key: SPARK-11714 > URL: https://issues.apache.org/jira/browse/SPARK-11714 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen > > Currently the MesosSchedulerBackend does not make any effort to honor "ports" > as a resource offer in Mesos. This ask is to have the ports which the > executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions
[ https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359413#comment-15359413 ] Charles Allen commented on SPARK-11714: --- My proposed solution to this is to have a new property for coarse mode which takes in a coma separated list of format strings which will be passed as extra java options to the executor. For example, if {code}-Dcom.sun.management.jmxremote.port=%s,-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=%s{code} is passed in the property, then both a JMX port and a remote debugging port will be acquired. > Make Spark on Mesos honor port restrictions > --- > > Key: SPARK-11714 > URL: https://issues.apache.org/jira/browse/SPARK-11714 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen > > Currently the MesosSchedulerBackend does not make any effort to honor "ports" > as a resource offer in Mesos. This ask is to have the ports which the > executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12248) Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios
[ https://issues.apache.org/jira/browse/SPARK-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen closed SPARK-12248. - Resolution: Fixed Fix Version/s: 2.0.0 > Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios > -- > > Key: SPARK-12248 > URL: https://issues.apache.org/jira/browse/SPARK-12248 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen > Fix For: 2.0.0 > > > It is possible to have spark apps that work best with either more memory or > more CPU. > In a multi-tenant environment (such as Mesos) it can be very beneficial to be > able to limit the Coarse scheduler to guarantee an executor doesn't subscribe > to too many cpus or too much memory. > This ask is to add functionality to the Coarse Mesos Scheduler to have basic > limits to the ratio of memory to cpu, which default to the current behavior > of soaking up whatever resources it can. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12248) Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios
[ https://issues.apache.org/jira/browse/SPARK-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334376#comment-15334376 ] Charles Allen commented on SPARK-12248: --- The limit of one task per slave seems to have been removed. That solves at least my use case in this matter. > Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios > -- > > Key: SPARK-12248 > URL: https://issues.apache.org/jira/browse/SPARK-12248 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen > Fix For: 2.0.0 > > > It is possible to have spark apps that work best with either more memory or > more CPU. > In a multi-tenant environment (such as Mesos) it can be very beneficial to be > able to limit the Coarse scheduler to guarantee an executor doesn't subscribe > to too many cpus or too much memory. > This ask is to add functionality to the Coarse Mesos Scheduler to have basic > limits to the ratio of memory to cpu, which default to the current behavior > of soaking up whatever resources it can. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15992) Code cleanup mesos coarse backend offer evaluation workflow
[ https://issues.apache.org/jira/browse/SPARK-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-15992: -- Attachment: (was: 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch) > Code cleanup mesos coarse backend offer evaluation workflow > --- > > Key: SPARK-15992 > URL: https://issues.apache.org/jira/browse/SPARK-15992 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Charles Allen > Labels: code-cleanup > > The offer acceptance workflow is a little hard to follow and not very > extensible for future considerations for offers. This is a patch that makes > the workflow a little more explicit in its handling of offer resources. > Patch incoming -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15994) Allow enabling Mesos fetch cache in coarse executor backend
[ https://issues.apache.org/jira/browse/SPARK-15994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-15994: -- Attachment: (was: 0001-Add-ability-to-enable-mesos-fetch-cache.patch) > Allow enabling Mesos fetch cache in coarse executor backend > > > Key: SPARK-15994 > URL: https://issues.apache.org/jira/browse/SPARK-15994 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Charles Allen > > Mesos 0.23.0 introduces a Fetch Cache feature > http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of > resources specified in command URIs. > This patch: > * Updates the Mesos shaded protobuf dependency to 0.23.0 > * Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache > for all specified URIs. (URIs must be specified for the setting to have any > affect) > * Updates documentation for Mesos configuration with the new setting. > This patch does NOT: > * Allow for per-URI caching configuration. The cache setting is global to ALL > URIs for the command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15994) Allow enabling Mesos fetch cache in coarse executor backend
[ https://issues.apache.org/jira/browse/SPARK-15994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-15994: -- Attachment: 0001-Add-ability-to-enable-mesos-fetch-cache.patch > Allow enabling Mesos fetch cache in coarse executor backend > > > Key: SPARK-15994 > URL: https://issues.apache.org/jira/browse/SPARK-15994 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Charles Allen > Attachments: 0001-Add-ability-to-enable-mesos-fetch-cache.patch > > > Mesos 0.23.0 introduces a Fetch Cache feature > http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of > resources specified in command URIs. > This patch: > * Updates the Mesos shaded protobuf dependency to 0.23.0 > * Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache > for all specified URIs. (URIs must be specified for the setting to have any > affect) > * Updates documentation for Mesos configuration with the new setting. > This patch does NOT: > * Allow for per-URI caching configuration. The cache setting is global to ALL > URIs for the command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15994) Allow enabling Mesos fetch cache in coarse executor backend
Charles Allen created SPARK-15994: - Summary: Allow enabling Mesos fetch cache in coarse executor backend Key: SPARK-15994 URL: https://issues.apache.org/jira/browse/SPARK-15994 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 2.0.0 Reporter: Charles Allen Mesos 0.23.0 introduces a Fetch Cache feature http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of resources specified in command URIs. This patch: * Updates the Mesos shaded protobuf dependency to 0.23.0 * Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache for all specified URIs. (URIs must be specified for the setting to have any affect) * Updates documentation for Mesos configuration with the new setting. This patch does NOT: * Allow for per-URI caching configuration. The cache setting is global to ALL URIs for the command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15992) Code cleanup mesos offer evaluation workflow
Charles Allen created SPARK-15992: - Summary: Code cleanup mesos offer evaluation workflow Key: SPARK-15992 URL: https://issues.apache.org/jira/browse/SPARK-15992 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 2.0.0 Reporter: Charles Allen Attachments: 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch The offer acceptance workflow is a little hard to follow and not very extensible for future considerations for offers. This is a patch that makes the workflow a little more explicit in its handling of offer resources. Patch incoming -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15992) Code cleanup mesos coarse backend offer evaluation workflow
[ https://issues.apache.org/jira/browse/SPARK-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-15992: -- Summary: Code cleanup mesos coarse backend offer evaluation workflow (was: Code cleanup mesos offer evaluation workflow) > Code cleanup mesos coarse backend offer evaluation workflow > --- > > Key: SPARK-15992 > URL: https://issues.apache.org/jira/browse/SPARK-15992 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Charles Allen > Labels: code-cleanup > Attachments: > 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch > > > The offer acceptance workflow is a little hard to follow and not very > extensible for future considerations for offers. This is a patch that makes > the workflow a little more explicit in its handling of offer resources. > Patch incoming -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15992) Code cleanup mesos coarse backend offer evaluation workflow
[ https://issues.apache.org/jira/browse/SPARK-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-15992: -- Attachment: 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch > Code cleanup mesos coarse backend offer evaluation workflow > --- > > Key: SPARK-15992 > URL: https://issues.apache.org/jira/browse/SPARK-15992 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Charles Allen > Labels: code-cleanup > Attachments: > 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch > > > The offer acceptance workflow is a little hard to follow and not very > extensible for future considerations for offers. This is a patch that makes > the workflow a little more explicit in its handling of offer resources. > Patch incoming -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11183) enable support for mesos 0.24+
[ https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320914#comment-15320914 ] Charles Allen commented on SPARK-11183: --- Eventually it could be wroth adopting something like https://github.com/mesosphere/mesos-rxjava to plug into the mesos cluster > enable support for mesos 0.24+ > -- > > Key: SPARK-11183 > URL: https://issues.apache.org/jira/browse/SPARK-11183 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Ioannis Polyzos > > mesos 0.24, the mesos leader info in ZK has changed to json tis result to > spark failed to running on 0.24+. > References : > https://issues.apache.org/jira/browse/MESOS-2340 > > http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E > https://github.com/mesos/elasticsearch/issues/338 > https://github.com/spark-jobserver/spark-jobserver/issues/267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11183) enable support for mesos 0.24+
[ https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320840#comment-15320840 ] Charles Allen commented on SPARK-11183: --- Being able to enable the fetch cache http://mesos.apache.org/documentation/latest/fetcher/ would be nice also > enable support for mesos 0.24+ > -- > > Key: SPARK-11183 > URL: https://issues.apache.org/jira/browse/SPARK-11183 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Ioannis Polyzos > > mesos 0.24, the mesos leader info in ZK has changed to json tis result to > spark failed to running on 0.24+. > References : > https://issues.apache.org/jira/browse/MESOS-2340 > > http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E > https://github.com/mesos/elasticsearch/issues/338 > https://github.com/spark-jobserver/spark-jobserver/issues/267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289676#comment-15289676 ] Charles Allen commented on SPARK-6305: -- Shading is often used as an artificial ClassLoader with the exception that you can't replace classes by replacing jars. So if it is used as a "we don't want you to replace stock classes" then that's fine, but if it is used as "ClassLoader isolation and dependency tracking is hard" then ultimately that is bad. (Spark has lots of fun classloaders to deal with, I know it is not trivial) For a usage example from my side, the spark druid indexer at https://github.com/metamx/druid-spark-batch uses good-'ol-fashioned-jars (without shading or assembly) with some primitive classloader isolation through https://github.com/druid-io/druid/blob/master/indexing-service/src/main/java/io/druid/indexing/common/task/HadoopTask.java#L128 this means the following is in a directory which is loaded in a classloader for the driver: activation-1.1.1.jar akka-actor_2.10-2.3.11.jar akka-remote_2.10-2.3.11.jar akka-slf4j_2.10-2.3.11.jar aopalliance-1.0.jar asm-3.1.jar avro-1.7.7.jar avro-ipc-1.7.7.jar avro-ipc-1.7.7-tests.jar avro-mapred-1.7.7-hadoop2.jar base64-2.3.8.jar bcprov-jdk15on-1.51.jar chill_2.10-0.5.0.jar chill-java-0.5.0.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.10.jar commons-collections-3.2.1.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-lang3-3.3.2.jar commons-math3-3.4.1.jar commons-net-2.2.jar compress-lzf-1.0.3.jar config-1.2.1.jar curator-client-2.4.0.jar curator-framework-2.4.0.jar curator-recipes-2.4.0.jar gmbal-api-only-3.0.0-b023.jar grizzly-framework-2.1.2.jar grizzly-http-2.1.2.jar grizzly-http-server-2.1.2.jar grizzly-http-servlet-2.1.2.jar grizzly-rcm-2.1.2.jar guice-3.0.jar hadoop-annotations-2.4.0-mmx6.jar hadoop-auth-2.4.0-mmx6.jar hadoop-client-2.4.0-mmx6.jar hadoop-common-2.4.0-mmx6.jar hadoop-hdfs-2.4.0-mmx6.jar hadoop-mapreduce-client-app-2.4.0-mmx6.jar hadoop-mapreduce-client-common-2.4.0-mmx6.jar hadoop-mapreduce-client-core-2.4.0-mmx6.jar hadoop-mapreduce-client-jobclient-2.4.0-mmx6.jar hadoop-mapreduce-client-shuffle-2.4.0-mmx6.jar hadoop-yarn-api-2.2.0.jar hadoop-yarn-client-2.2.0.jar hadoop-yarn-common-2.2.0.jar hadoop-yarn-server-common-2.4.0-mmx6.jar httpclient-4.3.6.jar httpcore-4.3.3.jar ivy-2.4.0.jar jackson-annotations-2.4.0.jar jackson-core-2.4.4.jar jackson-core-asl-1.9.13.jar jackson-databind-2.4.4.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-module-scala_2.10-2.4.4.jar jackson-xc-1.9.13.jar javax.inject-1.jar java-xmlbuilder-1.0.jar javax.servlet-3.0.0.v201112011016.jar javax.servlet-3.1.jar javax.servlet-api-3.0.1.jar jaxb-api-2.2.2.jar jaxb-impl-2.2.3-1.jar jcl-over-slf4j-1.7.10.jar jersey-client-1.9.jar jersey-core-1.9.jar jersey-grizzly2-1.9.jar jersey-guice-1.9.jar jersey-json-1.9.jar jersey-server-1.9.jar jersey-test-framework-core-1.9.jar jersey-test-framework-grizzly2-1.9.jar jets3t-0.9.3.jar jettison-1.1.jar jetty-util-6.1.26.jar jline-0.9.94.jar json4s-ast_2.10-3.2.10.jar json4s-core_2.10-3.2.10.jar json4s-jackson_2.10-3.2.10.jar jsr305-1.3.9.jar jul-to-slf4j-1.7.10.jar kryo-2.21.jar log4j-1.2.17.jar lz4-1.3.0.jar mail-1.4.7.jar management-api-3.0.0-b012.jar mesos-0.21.1-shaded-protobuf.jar metrics-core-3.1.2.jar metrics-graphite-3.1.2.jar metrics-json-3.1.2.jar metrics-jvm-3.1.2.jar minlog-1.2.jar mx4j-3.0.2.jar netty-3.8.0.Final.jar netty-all-4.0.29.Final.jar objenesis-1.2.jar oro-2.0.8.jar paranamer-2.6.jar protobuf-java-2.5.0.jar py4j-0.8.2.1.jar pyrolite-4.4.jar reflectasm-1.07-shaded.jar RoaringBitmap-0.4.5.jar scala-compiler-2.10.4.jar scala-library-2.10.4.jar scalap-2.10.4.jar scala-reflect-2.10.4.jar slf4j-api-1.7.10.jar slf4j-log4j12-1.7.10.jar snappy-java-1.1.1.7.jar spark-core_2.10-1.5.2-mmx4.jar spark-launcher_2.10-1.5.2-mmx4.jar spark-network-common_2.10-1.5.2-mmx4.jar spark-network-shuffle_2.10-1.5.2-mmx4.jar spark-unsafe_2.10-1.5.2-mmx4.jar stream-2.7.0.jar tachyon-client-0.7.1.jar tachyon-underfs-hdfs-0.7.1.jar tachyon-underfs-local-0.7.1.jar uncommons-maths-1.2.2a.jar unused-1.0.0.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.5.jar So that's the list of jars spark thinks it needs to get a driver to connect and launch a task. I haven't bothered to go through and clean out the unwanted jars because the classloader isolation is smart (lucky?) enough to where they don't interfere. The point is, I can go replace specific jars to control what versions of stuff are used. For example, I can update mesos versions for the driver independent of spark versions, or change the version of hadoop utilized by spark. There is an argument to be made that enforcing running the Spark test suite against these libs is probably a good
[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289508#comment-15289508 ] Charles Allen commented on SPARK-6305: -- For what it's worth, I went through a similar exercise for druid.io recently. Here's my resulting list of "Hadoop go away" exclusions: https://github.com/druid-io/druid/blob/druid-0.9.0/extensions-core/hdfs-storage/pom.xml#L49 Getting the provided, exclusions, and true sorted out for dependencies is not trivial. And one of the frustrating things with spark is its "screw it, make it a shaded assembly" approach to dependencies (anyone know how to get the new s3a stuff from the hadoop storage extension to work?). Not sure if there is an overall epic of "handle jar dependencies better" but I think this ask would fit better under that than simply a blank update of what slf4j impl spark wants to use. > Add support for log4j 2.x to Spark > -- > > Key: SPARK-6305 > URL: https://issues.apache.org/jira/browse/SPARK-6305 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Tal Sliwowicz >Priority: Minor > > log4j 2 requires replacing the slf4j binding and adding the log4j jars in the > classpath. Since there are shaded jars, it must be done during the build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14537) [CORE] SparkContext init hangs if master removes application before backend is ready.
[ https://issues.apache.org/jira/browse/SPARK-14537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-14537: -- Description: During the course of the init of the spark context, the following code is executed. {code} setupAndStartListenerBus() postEnvironmentUpdate() postApplicationStart() // Post init _taskScheduler.postStartHook() _env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager)) _executorAllocationManager.foreach { e => _env.metricsSystem.registerSource(e.executorAllocationManagerSource) } {code} If the _taskScheduler.postStartHook() is waiting for a signal from the backend that it is ready, AND the driver is disconnected from the master scheduler due to a message similar to the one below: {code} ERROR [sparkDriver-akka.actor.default-dispatcher-20] org.apache.spark.rpc.akka.AkkaRpcEnv - Ignore error: Exiting due to error from cluster scheduler: Master removed our application: FAILED org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: FAILED at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:431) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:122) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:243) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:167) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) [scala-library-2.10.5.jar:?] at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) [scala-library-2.10.5.jar:?] at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) [scala-library-2.10.5.jar:?] at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) [scala-library-2.10.5.jar:?] at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at akka.actor.Actor$class.aroundReceive(Actor.scala:467) [akka-actor_2.10-2.3.11.jar:?] at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [akka-actor_2.10-2.3.11.jar:?] at akka.actor.ActorCell.invoke(ActorCell.scala:487) [akka-actor_2.10-2.3.11.jar:?] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [akka-actor_2.10-2.3.11.jar:?] at akka.dispatch.Mailbox.run(Mailbox.scala:220) [akka-actor_2.10-2.3.11.jar:?] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [akka-actor_2.10-2.3.11.jar:?] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.5.jar:?] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.5.jar:?] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.5.jar:?] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.5.jar:?] {code} Then the SparkContext will hang on init because the waiting for the backend to be ready never checks to make sure the context is still running: {code:title=TaskSchedulerImpl.scala} private def waitBackendReady(): Unit = { if (backend.isReady) { return
[jira] [Created] (SPARK-14537) [CORE] SparkContext init hangs if master removes application before backend is ready.
Charles Allen created SPARK-14537: - Summary: [CORE] SparkContext init hangs if master removes application before backend is ready. Key: SPARK-14537 URL: https://issues.apache.org/jira/browse/SPARK-14537 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.5.2 Reporter: Charles Allen During the course of the init of the spark context, the following code is executed. {code:scala} setupAndStartListenerBus() postEnvironmentUpdate() postApplicationStart() // Post init _taskScheduler.postStartHook() _env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager)) _executorAllocationManager.foreach { e => _env.metricsSystem.registerSource(e.executorAllocationManagerSource) } {code} If the _taskScheduler.postStartHook() is waiting for a signal from the backend that it is ready, AND the driver is disconnected from the master scheduler due to a message similar to the one below: {code} ERROR [sparkDriver-akka.actor.default-dispatcher-20] org.apache.spark.rpc.akka.AkkaRpcEnv - Ignore error: Exiting due to error from cluster scheduler: Master removed our application: FAILED org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: FAILED at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:431) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:122) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:243) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:167) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) [scala-library-2.10.5.jar:?] at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) [scala-library-2.10.5.jar:?] at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) [scala-library-2.10.5.jar:?] at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) [scala-library-2.10.5.jar:?] at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at akka.actor.Actor$class.aroundReceive(Actor.scala:467) [akka-actor_2.10-2.3.11.jar:?] at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [akka-actor_2.10-2.3.11.jar:?] at akka.actor.ActorCell.invoke(ActorCell.scala:487) [akka-actor_2.10-2.3.11.jar:?] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [akka-actor_2.10-2.3.11.jar:?] at akka.dispatch.Mailbox.run(Mailbox.scala:220) [akka-actor_2.10-2.3.11.jar:?] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [akka-actor_2.10-2.3.11.jar:?] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.5.jar:?] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.5.jar:?] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.5.jar:?] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.5.jar:?] {code} Then the SparkContext will hang on
[jira] [Created] (SPARK-13085) Add scalastyle command used in build testing
Charles Allen created SPARK-13085: - Summary: Add scalastyle command used in build testing Key: SPARK-13085 URL: https://issues.apache.org/jira/browse/SPARK-13085 Project: Spark Issue Type: Wish Components: Build, Tests Reporter: Charles Allen As an occasional or new contributor, it is easy to screw up scala style. But looking at the output logs (for example https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull ) it is not obvious how to fix the scala style tests, even when reading the scala guide. {code} Running Scala style checks Scalastyle checks failed at following occurrences: [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0: import.ordering.wrongOrderInGroup.message [error] (core/compile:scalastyle) errors exist [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM [error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received return code 1 {code} This ask is that the command used to check scalastyle is presented in the log so a developer does not have to wait for the build process to check if a pull request should pass scala style checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13085) Add scalastyle command used in build testing
[ https://issues.apache.org/jira/browse/SPARK-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-13085: -- Description: As an occasional or new contributor, it is easy to screw up scala style. But looking at the output logs (for example https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull ) it is not obvious how to fix the scala style tests, even when reading the scala style guide. {code} Running Scala style checks Scalastyle checks failed at following occurrences: [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0: import.ordering.wrongOrderInGroup.message [error] (core/compile:scalastyle) errors exist [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM [error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received return code 1 {code} This ask is that the command used to check scalastyle is presented in the log so a developer does not have to wait for the build process to check if a pull request should pass scala style checks. was: As an occasional or new contributor, it is easy to screw up scala style. But looking at the output logs (for example https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull ) it is not obvious how to fix the scala style tests, even when reading the scala guide. {code} Running Scala style checks Scalastyle checks failed at following occurrences: [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0: import.ordering.wrongOrderInGroup.message [error] (core/compile:scalastyle) errors exist [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM [error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received return code 1 {code} This ask is that the command used to check scalastyle is presented in the log so a developer does not have to wait for the build process to check if a pull request should pass scala style checks. > Add scalastyle command used in build testing > > > Key: SPARK-13085 > URL: https://issues.apache.org/jira/browse/SPARK-13085 > Project: Spark > Issue Type: Wish > Components: Build, Tests >Reporter: Charles Allen > > As an occasional or new contributor, it is easy to screw up scala style. But > looking at the output logs (for example > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull > ) it is not obvious how to fix the scala style tests, even when reading the > scala style guide. > {code} > > Running Scala style checks > > Scalastyle checks failed at following occurrences: > [error] > /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0: > import.ordering.wrongOrderInGroup.message > [error] (core/compile:scalastyle) errors exist > [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM > [error] running > /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received > return code 1 > {code} > This ask is that the command used to check scalastyle is presented in the log > so a developer does not have to wait for the build process to check if a pull > request should pass scala style checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13085) Add scalastyle command used in build testing
[ https://issues.apache.org/jira/browse/SPARK-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124518#comment-15124518 ] Charles Allen commented on SPARK-13085: --- I wanted to know what command was failing the build and it was not obvious from the build logs. > Add scalastyle command used in build testing > > > Key: SPARK-13085 > URL: https://issues.apache.org/jira/browse/SPARK-13085 > Project: Spark > Issue Type: Wish > Components: Build, Tests >Reporter: Charles Allen > > As an occasional or new contributor, it is easy to screw up scala style. But > looking at the output logs (for example > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull > ) it is not obvious how to fix the scala style tests, even when reading the > scala style guide. > {code} > > Running Scala style checks > > Scalastyle checks failed at following occurrences: > [error] > /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0: > import.ordering.wrongOrderInGroup.message > [error] (core/compile:scalastyle) errors exist > [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM > [error] running > /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received > return code 1 > {code} > This ask is that the command used to check scalastyle is presented in the log > so a developer does not have to wait for the build process to check if a pull > request should pass scala style checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13085) Add scalastyle command used in build testing
[ https://issues.apache.org/jira/browse/SPARK-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123946#comment-15123946 ] Charles Allen commented on SPARK-13085: --- {code} mvn scalastyle:check {code} was able to produce a similar error, but it is not obvious if that is the same command the build uses. > Add scalastyle command used in build testing > > > Key: SPARK-13085 > URL: https://issues.apache.org/jira/browse/SPARK-13085 > Project: Spark > Issue Type: Wish > Components: Build, Tests >Reporter: Charles Allen > > As an occasional or new contributor, it is easy to screw up scala style. But > looking at the output logs (for example > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50300/consoleFull > ) it is not obvious how to fix the scala style tests, even when reading the > scala style guide. > {code} > > Running Scala style checks > > Scalastyle checks failed at following occurrences: > [error] > /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:22:0: > import.ordering.wrongOrderInGroup.message > [error] (core/compile:scalastyle) errors exist > [error] Total time: 9 s, completed Jan 28, 2016 2:11:00 PM > [error] running > /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-scala ; received > return code 1 > {code} > This ask is that the command used to check scalastyle is presented in the log > so a developer does not have to wait for the build process to check if a pull > request should pass scala style checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1865) Improve behavior of cleanup of disk state
[ https://issues.apache.org/jira/browse/SPARK-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068542#comment-15068542 ] Charles Allen commented on SPARK-1865: -- This is compounded by the fact that some of the shutdown processes will stop() during the normal course of the main thread, then will fail to wait for completion if stop() is ALSO called via the shutdown hook. > Improve behavior of cleanup of disk state > - > > Key: SPARK-1865 > URL: https://issues.apache.org/jira/browse/SPARK-1865 > Project: Spark > Issue Type: Improvement > Components: Deploy, Spark Core >Reporter: Aaron Davidson > > Right now the behavior of disk cleanup is centered around the exit hook of > the executor, which attempts to cleanup shuffle files and disk manager > blocks, but may fail. We should make this behavior more predictable, perhaps > by letting the Standalone Worker cleanup the disk state, and adding a flag to > disable having the executor cleanup its own state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12330) Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk
[ https://issues.apache.org/jira/browse/SPARK-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058819#comment-15058819 ] Charles Allen commented on SPARK-12330: --- Looks like the mesos coarse scheduler underwent a lot of changes in 1.6 vs 1.5. In the 1.6 branch I'm getting errors on the terminating of the coarse tasks and reporting of failed tasks. https://github.com/metamx/spark/commit/338f511e3f6ef03f457555d838ed3a694a77dece That same stuff against 1.5.1 has a nice and clean shutdown of the executors on mesos, reporting FINISHED instead of FAILED or KILLED. > Mesos coarse executor does not cleanup blockmgr properly on termination if > data is stored on disk > - > > Key: SPARK-12330 > URL: https://issues.apache.org/jira/browse/SPARK-12330 > Project: Spark > Issue Type: Bug > Components: Block Manager, Mesos >Affects Versions: 1.5.1 >Reporter: Charles Allen > > A simple line count example can be launched as similar to > {code} > SPARK_HOME=/mnt/tmp/spark > MASTER=mesos://zk://zk.metamx-prod.com:2181/mesos-druid/metrics > ./bin/spark-shell --conf spark.mesos.coarse=true --conf spark.cores.max=7 > --conf spark.mesos.executor.memoryOverhead=2048 --conf > spark.mesos.executor.home=/mnt/tmp/spark --conf > spark.executor.extraJavaOptions='-Duser.timezone=UTC -Dfile.encoding=UTF-8 > -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 > -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution > -XX:+PrintFlagsFinal -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MaxDirectMemorySize=1024m > -verbose:gc -XX:+PrintFlagsFinal -Djava.io.tmpdir=/mnt/tmp/scratch' --conf > spark.hadoop.fs.s3n.awsAccessKeyId='REDACTED' --conf > spark.hadoop.fs.s3n.awsSecretAccessKey='REDACTED' --conf > spark.executor.memory=7g --conf spark.executorEnv.GLOG_v=9 --conf > spark.storage.memoryFraction=0.0 --conf spark.shuffle.memoryFraction=0.0 > {code} > In the shell the following lines can be executed: > {code} > val text_file = > sc.textFile("s3n://REDACTED/charlesallen/tpch/lineitem.tbl").persist(org.apache.spark.storage.StorageLevel.DISK_ONLY) > {code} > {code} > text_file.map(l => 1).sum > {code} > which will result in > {code} > res0: Double = 6001215.0 > {code} > for the TPCH 1GB dataset > Unfortunately the blockmgr directory remains on the executor node after > termination of the spark context. > The log on the executor looks like this near the termination: > {code} > I1215 02:12:31.190878 130732 process.cpp:566] Parsed message name > 'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.30:58604 > from slave(1)@172.19.67.30:5051 > I1215 02:12:31.190928 130732 process.cpp:2382] Spawned process > __http__(4)@172.19.67.30:58604 > I1215 02:12:31.190932 130721 process.cpp:2392] Resuming > executor(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.190924800+00:00 > I1215 02:12:31.190958 130702 process.cpp:2392] Resuming > __http__(4)@172.19.67.30:58604 at 2015-12-15 02:12:31.190951936+00:00 > I1215 02:12:31.190976 130721 exec.cpp:381] Executor asked to shutdown > I1215 02:12:31.190943 130727 process.cpp:2392] Resuming > __gc__@172.19.67.30:58604 at 2015-12-15 02:12:31.190937088+00:00 > I1215 02:12:31.190991 130702 process.cpp:2497] Cleaning up > __http__(4)@172.19.67.30:58604 > I1215 02:12:31.191032 130721 process.cpp:2382] Spawned process > (2)@172.19.67.30:58604 > I1215 02:12:31.191040 130702 process.cpp:2392] Resuming > (2)@172.19.67.30:58604 at 2015-12-15 02:12:31.191037952+00:00 > I1215 02:12:31.191054 130702 exec.cpp:80] Scheduling shutdown of the executor > I1215 02:12:31.191069 130721 exec.cpp:396] Executor::shutdown took 21572ns > I1215 02:12:31.191073 130702 clock.cpp:260] Created a timer for > (2)@172.19.67.30:58604 in 5secs in the future (2015-12-15 > 02:12:36.191062016+00:00) > I1215 02:12:31.191066 130720 process.cpp:2392] Resuming > (1)@172.19.67.30:58604 at 2015-12-15 02:12:31.191059200+00:00 > 15/12/15 02:12:31 INFO CoarseGrainedExecutorBackend: Driver commanded a > shutdown > I1215 02:12:31.240103 130732 clock.cpp:151] Handling timers up to 2015-12-15 > 02:12:31.240091136+00:00 > I1215 02:12:31.240123 130732 clock.cpp:158] Have timeout(s) at 2015-12-15 > 02:12:31.240036096+00:00 > I1215 02:12:31.240183 130730 process.cpp:2392] Resuming > reaper(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.240178176+00:00 > I1215 02:12:31.240226 130730 clock.cpp:260] Created a timer for > reaper(1)@172.19.67.30:58604 in 100ms in the future (2015-12-15 > 02:12:31.340212992+00:00) > I1215 02:12:31.247019 130720 clock.cpp:260] Created a timer for > (1)@172.19.67.30:58604 in 3secs in the future (2015-12-15 > 02:12:34.247005952+00:00) > 15/12/15 02:12:31 ERROR
[jira] [Commented] (SPARK-12330) Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk
[ https://issues.apache.org/jira/browse/SPARK-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058729#comment-15058729 ] Charles Allen commented on SPARK-12330: --- This is because the CoarseMesosSchedulerBackend does not wait for the environment to report tasks finished before shutting down the mesos driver. I have a fix that seems to be working against 1.5.1. will see if I can cherry-pick it to master > Mesos coarse executor does not cleanup blockmgr properly on termination if > data is stored on disk > - > > Key: SPARK-12330 > URL: https://issues.apache.org/jira/browse/SPARK-12330 > Project: Spark > Issue Type: Bug > Components: Block Manager, Mesos >Affects Versions: 1.5.1 >Reporter: Charles Allen > > A simple line count example can be launched as similar to > {code} > SPARK_HOME=/mnt/tmp/spark > MASTER=mesos://zk://zk.metamx-prod.com:2181/mesos-druid/metrics > ./bin/spark-shell --conf spark.mesos.coarse=true --conf spark.cores.max=7 > --conf spark.mesos.executor.memoryOverhead=2048 --conf > spark.mesos.executor.home=/mnt/tmp/spark --conf > spark.executor.extraJavaOptions='-Duser.timezone=UTC -Dfile.encoding=UTF-8 > -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 > -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution > -XX:+PrintFlagsFinal -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MaxDirectMemorySize=1024m > -verbose:gc -XX:+PrintFlagsFinal -Djava.io.tmpdir=/mnt/tmp/scratch' --conf > spark.hadoop.fs.s3n.awsAccessKeyId='REDACTED' --conf > spark.hadoop.fs.s3n.awsSecretAccessKey='REDACTED' --conf > spark.executor.memory=7g --conf spark.executorEnv.GLOG_v=9 --conf > spark.storage.memoryFraction=0.0 --conf spark.shuffle.memoryFraction=0.0 > {code} > In the shell the following lines can be executed: > {code} > val text_file = > sc.textFile("s3n://REDACTED/charlesallen/tpch/lineitem.tbl").persist(org.apache.spark.storage.StorageLevel.DISK_ONLY) > {code} > {code} > text_file.map(l => 1).sum > {code} > which will result in > {code} > res0: Double = 6001215.0 > {code} > for the TPCH 1GB dataset > Unfortunately the blockmgr directory remains on the executor node after > termination of the spark context. > The log on the executor looks like this near the termination: > {code} > I1215 02:12:31.190878 130732 process.cpp:566] Parsed message name > 'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.30:58604 > from slave(1)@172.19.67.30:5051 > I1215 02:12:31.190928 130732 process.cpp:2382] Spawned process > __http__(4)@172.19.67.30:58604 > I1215 02:12:31.190932 130721 process.cpp:2392] Resuming > executor(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.190924800+00:00 > I1215 02:12:31.190958 130702 process.cpp:2392] Resuming > __http__(4)@172.19.67.30:58604 at 2015-12-15 02:12:31.190951936+00:00 > I1215 02:12:31.190976 130721 exec.cpp:381] Executor asked to shutdown > I1215 02:12:31.190943 130727 process.cpp:2392] Resuming > __gc__@172.19.67.30:58604 at 2015-12-15 02:12:31.190937088+00:00 > I1215 02:12:31.190991 130702 process.cpp:2497] Cleaning up > __http__(4)@172.19.67.30:58604 > I1215 02:12:31.191032 130721 process.cpp:2382] Spawned process > (2)@172.19.67.30:58604 > I1215 02:12:31.191040 130702 process.cpp:2392] Resuming > (2)@172.19.67.30:58604 at 2015-12-15 02:12:31.191037952+00:00 > I1215 02:12:31.191054 130702 exec.cpp:80] Scheduling shutdown of the executor > I1215 02:12:31.191069 130721 exec.cpp:396] Executor::shutdown took 21572ns > I1215 02:12:31.191073 130702 clock.cpp:260] Created a timer for > (2)@172.19.67.30:58604 in 5secs in the future (2015-12-15 > 02:12:36.191062016+00:00) > I1215 02:12:31.191066 130720 process.cpp:2392] Resuming > (1)@172.19.67.30:58604 at 2015-12-15 02:12:31.191059200+00:00 > 15/12/15 02:12:31 INFO CoarseGrainedExecutorBackend: Driver commanded a > shutdown > I1215 02:12:31.240103 130732 clock.cpp:151] Handling timers up to 2015-12-15 > 02:12:31.240091136+00:00 > I1215 02:12:31.240123 130732 clock.cpp:158] Have timeout(s) at 2015-12-15 > 02:12:31.240036096+00:00 > I1215 02:12:31.240183 130730 process.cpp:2392] Resuming > reaper(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.240178176+00:00 > I1215 02:12:31.240226 130730 clock.cpp:260] Created a timer for > reaper(1)@172.19.67.30:58604 in 100ms in the future (2015-12-15 > 02:12:31.340212992+00:00) > I1215 02:12:31.247019 130720 clock.cpp:260] Created a timer for > (1)@172.19.67.30:58604 in 3secs in the future (2015-12-15 > 02:12:34.247005952+00:00) > 15/12/15 02:12:31 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: > SIGTERM > 15/12/15 02:12:31 INFO ShutdownHookManager: Shutdown hook called > no more java logs > {code} > If the
[jira] [Created] (SPARK-12330) Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk
Charles Allen created SPARK-12330: - Summary: Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk Key: SPARK-12330 URL: https://issues.apache.org/jira/browse/SPARK-12330 Project: Spark Issue Type: Bug Components: Block Manager, Mesos Affects Versions: 1.5.1 Reporter: Charles Allen A simple line count example can be launched as similar to {code} SPARK_HOME=/mnt/tmp/spark MASTER=mesos://zk://zk.metamx-prod.com:2181/mesos-druid/metrics ./bin/spark-shell --conf spark.mesos.coarse=true --conf spark.cores.max=7 --conf spark.mesos.executor.memoryOverhead=2048 --conf spark.mesos.executor.home=/mnt/tmp/spark --conf spark.executor.extraJavaOptions='-Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintFlagsFinal -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MaxDirectMemorySize=1024m -verbose:gc -XX:+PrintFlagsFinal -Djava.io.tmpdir=/mnt/tmp/scratch' --conf spark.hadoop.fs.s3n.awsAccessKeyId='REDACTED' --conf spark.hadoop.fs.s3n.awsSecretAccessKey='REDACTED' --conf spark.executor.memory=7g --conf spark.executorEnv.GLOG_v=9 --conf spark.storage.memoryFraction=0.0 --conf spark.shuffle.memoryFraction=0.0 {code} In the shell the following lines can be executed: {code} val text_file = sc.textFile("s3n://REDACTED/charlesallen/tpch/lineitem.tbl").persist(org.apache.spark.storage.StorageLevel.DISK_ONLY) {code} {code} text_file.map(l => 1).sum {code} which will result in {code} res0: Double = 6001215.0 {code} for the TPCH 1GB dataset Unfortunately the blockmgr directory remains on the executor node after termination of the spark context. The log on the executor looks like this near the termination: {code} I1215 02:12:31.190878 130732 process.cpp:566] Parsed message name 'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.30:58604 from slave(1)@172.19.67.30:5051 I1215 02:12:31.190928 130732 process.cpp:2382] Spawned process __http__(4)@172.19.67.30:58604 I1215 02:12:31.190932 130721 process.cpp:2392] Resuming executor(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.190924800+00:00 I1215 02:12:31.190958 130702 process.cpp:2392] Resuming __http__(4)@172.19.67.30:58604 at 2015-12-15 02:12:31.190951936+00:00 I1215 02:12:31.190976 130721 exec.cpp:381] Executor asked to shutdown I1215 02:12:31.190943 130727 process.cpp:2392] Resuming __gc__@172.19.67.30:58604 at 2015-12-15 02:12:31.190937088+00:00 I1215 02:12:31.190991 130702 process.cpp:2497] Cleaning up __http__(4)@172.19.67.30:58604 I1215 02:12:31.191032 130721 process.cpp:2382] Spawned process (2)@172.19.67.30:58604 I1215 02:12:31.191040 130702 process.cpp:2392] Resuming (2)@172.19.67.30:58604 at 2015-12-15 02:12:31.191037952+00:00 I1215 02:12:31.191054 130702 exec.cpp:80] Scheduling shutdown of the executor I1215 02:12:31.191069 130721 exec.cpp:396] Executor::shutdown took 21572ns I1215 02:12:31.191073 130702 clock.cpp:260] Created a timer for (2)@172.19.67.30:58604 in 5secs in the future (2015-12-15 02:12:36.191062016+00:00) I1215 02:12:31.191066 130720 process.cpp:2392] Resuming (1)@172.19.67.30:58604 at 2015-12-15 02:12:31.191059200+00:00 15/12/15 02:12:31 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown I1215 02:12:31.240103 130732 clock.cpp:151] Handling timers up to 2015-12-15 02:12:31.240091136+00:00 I1215 02:12:31.240123 130732 clock.cpp:158] Have timeout(s) at 2015-12-15 02:12:31.240036096+00:00 I1215 02:12:31.240183 130730 process.cpp:2392] Resuming reaper(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.240178176+00:00 I1215 02:12:31.240226 130730 clock.cpp:260] Created a timer for reaper(1)@172.19.67.30:58604 in 100ms in the future (2015-12-15 02:12:31.340212992+00:00) I1215 02:12:31.247019 130720 clock.cpp:260] Created a timer for (1)@172.19.67.30:58604 in 3secs in the future (2015-12-15 02:12:34.247005952+00:00) 15/12/15 02:12:31 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 15/12/15 02:12:31 INFO ShutdownHookManager: Shutdown hook called no more java logs {code} If the shuffle fraction is NOT set to 0.0, and the data is allowed to stay in memory, then the following log can be seen at termination instead: {code} I1215 01:19:16.247705 120052 process.cpp:566] Parsed message name 'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.24:60016 from slave(1)@172.19.67.24:5051 I1215 01:19:16.247745 120052 process.cpp:2382] Spawned process __http__(4)@172.19.67.24:60016 I1215 01:19:16.247747 120034 process.cpp:2392] Resuming executor(1)@172.19.67.24:60016 at 2015-12-15 01:19:16.247741952+00:00 I1215 01:19:16.247758 120030 process.cpp:2392] Resuming __gc__@172.19.67.24:60016 at 2015-12-15
[jira] [Updated] (SPARK-12330) Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk
[ https://issues.apache.org/jira/browse/SPARK-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-12330: -- Description: A simple line count example can be launched as similar to {code} SPARK_HOME=/mnt/tmp/spark MASTER=mesos://zk://zk.metamx-prod.com:2181/mesos-druid/metrics ./bin/spark-shell --conf spark.mesos.coarse=true --conf spark.cores.max=7 --conf spark.mesos.executor.memoryOverhead=2048 --conf spark.mesos.executor.home=/mnt/tmp/spark --conf spark.executor.extraJavaOptions='-Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintFlagsFinal -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MaxDirectMemorySize=1024m -verbose:gc -XX:+PrintFlagsFinal -Djava.io.tmpdir=/mnt/tmp/scratch' --conf spark.hadoop.fs.s3n.awsAccessKeyId='REDACTED' --conf spark.hadoop.fs.s3n.awsSecretAccessKey='REDACTED' --conf spark.executor.memory=7g --conf spark.executorEnv.GLOG_v=9 --conf spark.storage.memoryFraction=0.0 --conf spark.shuffle.memoryFraction=0.0 {code} In the shell the following lines can be executed: {code} val text_file = sc.textFile("s3n://REDACTED/charlesallen/tpch/lineitem.tbl").persist(org.apache.spark.storage.StorageLevel.DISK_ONLY) {code} {code} text_file.map(l => 1).sum {code} which will result in {code} res0: Double = 6001215.0 {code} for the TPCH 1GB dataset Unfortunately the blockmgr directory remains on the executor node after termination of the spark context. The log on the executor looks like this near the termination: {code} I1215 02:12:31.190878 130732 process.cpp:566] Parsed message name 'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.30:58604 from slave(1)@172.19.67.30:5051 I1215 02:12:31.190928 130732 process.cpp:2382] Spawned process __http__(4)@172.19.67.30:58604 I1215 02:12:31.190932 130721 process.cpp:2392] Resuming executor(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.190924800+00:00 I1215 02:12:31.190958 130702 process.cpp:2392] Resuming __http__(4)@172.19.67.30:58604 at 2015-12-15 02:12:31.190951936+00:00 I1215 02:12:31.190976 130721 exec.cpp:381] Executor asked to shutdown I1215 02:12:31.190943 130727 process.cpp:2392] Resuming __gc__@172.19.67.30:58604 at 2015-12-15 02:12:31.190937088+00:00 I1215 02:12:31.190991 130702 process.cpp:2497] Cleaning up __http__(4)@172.19.67.30:58604 I1215 02:12:31.191032 130721 process.cpp:2382] Spawned process (2)@172.19.67.30:58604 I1215 02:12:31.191040 130702 process.cpp:2392] Resuming (2)@172.19.67.30:58604 at 2015-12-15 02:12:31.191037952+00:00 I1215 02:12:31.191054 130702 exec.cpp:80] Scheduling shutdown of the executor I1215 02:12:31.191069 130721 exec.cpp:396] Executor::shutdown took 21572ns I1215 02:12:31.191073 130702 clock.cpp:260] Created a timer for (2)@172.19.67.30:58604 in 5secs in the future (2015-12-15 02:12:36.191062016+00:00) I1215 02:12:31.191066 130720 process.cpp:2392] Resuming (1)@172.19.67.30:58604 at 2015-12-15 02:12:31.191059200+00:00 15/12/15 02:12:31 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown I1215 02:12:31.240103 130732 clock.cpp:151] Handling timers up to 2015-12-15 02:12:31.240091136+00:00 I1215 02:12:31.240123 130732 clock.cpp:158] Have timeout(s) at 2015-12-15 02:12:31.240036096+00:00 I1215 02:12:31.240183 130730 process.cpp:2392] Resuming reaper(1)@172.19.67.30:58604 at 2015-12-15 02:12:31.240178176+00:00 I1215 02:12:31.240226 130730 clock.cpp:260] Created a timer for reaper(1)@172.19.67.30:58604 in 100ms in the future (2015-12-15 02:12:31.340212992+00:00) I1215 02:12:31.247019 130720 clock.cpp:260] Created a timer for (1)@172.19.67.30:58604 in 3secs in the future (2015-12-15 02:12:34.247005952+00:00) 15/12/15 02:12:31 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 15/12/15 02:12:31 INFO ShutdownHookManager: Shutdown hook called no more java logs {code} If the shuffle fraction is NOT set to 0.0, and the data is allowed to stay in memory, then the following log can be seen at termination instead: {code} I1215 01:19:16.247705 120052 process.cpp:566] Parsed message name 'mesos.internal.ShutdownExecutorMessage' for executor(1)@172.19.67.24:60016 from slave(1)@172.19.67.24:5051 I1215 01:19:16.247745 120052 process.cpp:2382] Spawned process __http__(4)@172.19.67.24:60016 I1215 01:19:16.247747 120034 process.cpp:2392] Resuming executor(1)@172.19.67.24:60016 at 2015-12-15 01:19:16.247741952+00:00 I1215 01:19:16.247758 120030 process.cpp:2392] Resuming __gc__@172.19.67.24:60016 at 2015-12-15 01:19:16.247755008+00:00 I1215 01:19:16.247772 120034 exec.cpp:381] Executor asked to shutdown I1215 01:19:16.247772 120038 process.cpp:2392] Resuming __http__(4)@172.19.67.24:60016 at 2015-12-15 01:19:16.247767808+00:00 I1215 01:19:16.247791 120038 process.cpp:2497]
[jira] [Resolved] (SPARK-12226) Docs for Mesos don't mention shaded protobuf version
[ https://issues.apache.org/jira/browse/SPARK-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen resolved SPARK-12226. --- Resolution: Won't Fix Closing as won't fix for now since its not obvious it is a spark supported use case > Docs for Mesos don't mention shaded protobuf version > > > Key: SPARK-12226 > URL: https://issues.apache.org/jira/browse/SPARK-12226 > Project: Spark > Issue Type: Documentation > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Charles Allen >Priority: Minor > > http://spark.apache.org/docs/latest/running-on-mesos.html does not mention > that org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend is > compiled against the shaded version of the mesos java library. As such the > need to use mesos--shaded-protobuf.jar instead of mesos-.jar is not > apparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12226) Docs for Mesos don't mention shaded protobuf version
[ https://issues.apache.org/jira/browse/SPARK-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049393#comment-15049393 ] Charles Allen commented on SPARK-12226: --- [~srowen] Since this is in a project that I'm assembling from independent jars, and not using a spark-assembly directly, I've been considering how to best make sure this requirement (for an arguably corner case) should best be communicated. Putting it on http://spark.apache.org/docs/latest/running-on-mesos.html seems like it might be more confusing than helpful since it really only pertains to people building new spark bundles. Any suggestion on where would be a good place to document it? > Docs for Mesos don't mention shaded protobuf version > > > Key: SPARK-12226 > URL: https://issues.apache.org/jira/browse/SPARK-12226 > Project: Spark > Issue Type: Documentation > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Charles Allen >Priority: Minor > > http://spark.apache.org/docs/latest/running-on-mesos.html does not mention > that org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend is > compiled against the shaded version of the mesos java library. As such the > need to use mesos--shaded-protobuf.jar instead of mesos-.jar is not > apparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12248) Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios
Charles Allen created SPARK-12248: - Summary: Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios Key: SPARK-12248 URL: https://issues.apache.org/jira/browse/SPARK-12248 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Charles Allen It is possible to have spark apps that work best with either more memory or more CPU. In a multi-tenant environment (such as Mesos) it can be very beneficial to be able to limit the Coarse scheduler to guarantee an executor doesn't subscribe to too many cpus or too much memory. This ask is to add functionality to the Coarse Mesos Scheduler to have basic limits to the ratio of memory to cpu, which default to the current behavior of soaking up whatever resources it can. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12226) Docs for Mesos don't mention shaded protobuf version
Charles Allen created SPARK-12226: - Summary: Docs for Mesos don't mention shaded protobuf version Key: SPARK-12226 URL: https://issues.apache.org/jira/browse/SPARK-12226 Project: Spark Issue Type: Documentation Components: Mesos Affects Versions: 1.0.0 Reporter: Charles Allen http://spark.apache.org/docs/latest/running-on-mesos.html does not mention that org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend is compiled against the shaded version of the mesos java library. As such the need to use mesos--shaded-protobuf.jar instead of mesos-.jar is not apparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12226) Docs for Mesos don't mention shaded protobuf version
[ https://issues.apache.org/jira/browse/SPARK-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated SPARK-12226: -- Priority: Minor (was: Major) > Docs for Mesos don't mention shaded protobuf version > > > Key: SPARK-12226 > URL: https://issues.apache.org/jira/browse/SPARK-12226 > Project: Spark > Issue Type: Documentation > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Charles Allen >Priority: Minor > > http://spark.apache.org/docs/latest/running-on-mesos.html does not mention > that org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend is > compiled against the shaded version of the mesos java library. As such the > need to use mesos--shaded-protobuf.jar instead of mesos-.jar is not > apparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11016) Spark fails when running with a task that requires a more recent version of RoaringBitmaps
[ https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007640#comment-15007640 ] Charles Allen commented on SPARK-11016: --- [~davies] Was in a meeting, looks like you got it :) > Spark fails when running with a task that requires a more recent version of > RoaringBitmaps > -- > > Key: SPARK-11016 > URL: https://issues.apache.org/jira/browse/SPARK-11016 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Charles Allen > Fix For: 1.6.0 > > > The following error appears during Kryo init whenever a more recent version > (>0.5.0) of Roaring bitmaps is required by a job. > org/roaringbitmap/RoaringArray$Element was removed in 0.5.0 > {code} > A needed class was not found. This could be due to an error in your runpath. > Missing class: org/roaringbitmap/RoaringArray$Element > java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala:338) > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala) > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:222) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138) > at > org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) > at > org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:816) > {code} > See https://issues.apache.org/jira/browse/SPARK-5949 for related info -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11714) Make Spark on Mesos honor port restrictions
Charles Allen created SPARK-11714: - Summary: Make Spark on Mesos honor port restrictions Key: SPARK-11714 URL: https://issues.apache.org/jira/browse/SPARK-11714 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Charles Allen Currently the MesosSchedulerBackend does not make any effort to honor "ports" as a resource offer in Mesos. This ask is to have the ports which the executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11016) Spark fails when running with a task that requires a more recent version of RoaringBitmaps
[ https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964043#comment-14964043 ] Charles Allen commented on SPARK-11016: --- [~srowen] I confirmed locally that https://github.com/metamx/spark/pull/1 prevents this error, but as per your prior comment a "more correct" implementation would probably provide a Kryo Externalizable bridge of some kind. > Spark fails when running with a task that requires a more recent version of > RoaringBitmaps > -- > > Key: SPARK-11016 > URL: https://issues.apache.org/jira/browse/SPARK-11016 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Charles Allen > > The following error appears during Kryo init whenever a more recent version > (>0.5.0) of Roaring bitmaps is required by a job. > org/roaringbitmap/RoaringArray$Element was removed in 0.5.0 > {code} > A needed class was not found. This could be due to an error in your runpath. > Missing class: org/roaringbitmap/RoaringArray$Element > java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala:338) > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala) > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:222) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138) > at > org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) > at > org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:816) > {code} > See https://issues.apache.org/jira/browse/SPARK-5949 for related info -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951238#comment-14951238 ] Charles Allen commented on SPARK-8142: -- I had a similar failure as topic and solved it by setting "spark.executor.userClassPathFirst" to "false" and "spark.driver.userClassPathFirst" to "false" > Spark Job Fails with ResultTask ClassCastException > -- > > Key: SPARK-8142 > URL: https://issues.apache.org/jira/browse/SPARK-8142 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.1 >Reporter: Dev Lakhani > > When running a Spark Job, I get no failures in the application code > whatsoever but a weird ResultTask Class exception. In my job, I create a RDD > from HBase and for each partition do a REST call on an API, using a REST > client. This has worked in IntelliJ but when I deploy to a cluster using > spark-submit.sh I get : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, host): java.lang.ClassCastException: > org.apache.spark.scheduler.ResultTask cannot be cast to > org.apache.spark.scheduler.Task > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > These are the configs I set to override the spark classpath because I want to > use my own glassfish jersey version: > > sparkConf.set("spark.driver.userClassPathFirst","true"); > sparkConf.set("spark.executor.userClassPathFirst","true"); > I see no other warnings or errors in any of the logs. > Unfortunately I cannot post my code, but please ask me questions that will > help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11016) Spark fails when running with a task that requires a more recent version of RoaringBitmaps
[ https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950651#comment-14950651 ] Charles Allen commented on SPARK-11016: --- [~srowen] As mentioned in https://issues.apache.org/jira/browse/SPARK-5949?focusedCommentId=14949819=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14949819 spark is relying on native Kryo serde for RoaringBitmap stuff in KryoSerializer: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala#L368 including the protected Element class: https://github.com/lemire/RoaringBitmap/blob/RoaringBitmap-0.4.5/src/main/java/org/roaringbitmap/RoaringArray.java#L361 which was removed in 0.5.0 and later (Spark is on 0.4.5 currently) The SerDe method sanctioned by the RoaringBitmap library is to use the serialize and deserialize methods provided by the RoaringBitmap or RoaringArray object. Access to a protected class causes conflicts if a 0.5.0 or later version of the RoaringBitmap library is used because Spark will unavoidably fail when it tries to register everything in org.apache.spark.serializer.KryoSerializer#toRegister , including the no-longer-existing protected inner static class I did a quick jab at a patch locally by registering RoaringBitmap and RoaringArray with a com.esotericsoftware.kryo.Serializer, but it is not clear how close KryoInput and KryoOutput are to DataInput / DataOutput, which means a bridging approach might violate the contract of one or the other. > Spark fails when running with a task that requires a more recent version of > RoaringBitmaps > -- > > Key: SPARK-11016 > URL: https://issues.apache.org/jira/browse/SPARK-11016 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Charles Allen > > The following error appears during Kryo init whenever a more recent version > (>0.5.0) of Roaring bitmaps is required by a job. > org/roaringbitmap/RoaringArray$Element was removed in 0.5.0 > {code} > A needed class was not found. This could be due to an error in your runpath. > Missing class: org/roaringbitmap/RoaringArray$Element > java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala:338) > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala) > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:222) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138) > at > org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) > at > org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:816) > {code} > See https://issues.apache.org/jira/browse/SPARK-5949 for related info -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5949) Driver program has to register roaring bitmap classes used by spark with Kryo when number of partitions is greater than 2000
[ https://issues.apache.org/jira/browse/SPARK-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949525#comment-14949525 ] Charles Allen commented on SPARK-5949: -- [~lemire] pinging to see if you have any suggestions on how to handle situations like this. > Driver program has to register roaring bitmap classes used by spark with Kryo > when number of partitions is greater than 2000 > > > Key: SPARK-5949 > URL: https://issues.apache.org/jira/browse/SPARK-5949 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Peter Torok >Assignee: Imran Rashid > Labels: kryo, partitioning, serialization > Fix For: 1.4.0 > > > When more than 2000 partitions are being used with Kryo, the following > classes need to be registered by driver program: > - org.apache.spark.scheduler.HighlyCompressedMapStatus > - org.roaringbitmap.RoaringBitmap > - org.roaringbitmap.RoaringArray > - org.roaringbitmap.ArrayContainer > - org.roaringbitmap.RoaringArray$Element > - org.roaringbitmap.RoaringArray$Element[] > - short[] > Our project doesn't have dependency on roaring bitmap and > HighlyCompressedMapStatus is intended for internal spark usage. Spark should > take care of this registration when Kryo is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5949) Driver program has to register roaring bitmap classes used by spark with Kryo when number of partitions is greater than 2000
[ https://issues.apache.org/jira/browse/SPARK-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949518#comment-14949518 ] Charles Allen commented on SPARK-5949: -- This breaks when using more recent versions of Roaring where org.roaringbitmap.RoaringArray$Element is no longer present. The following stack trace appears: {code} A needed class was not found. This could be due to an error in your runpath. Missing class: org/roaringbitmap/RoaringArray$Element java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element at org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala:338) at org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237) at org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:222) at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.textFile(SparkContext.scala:816) at io.druid.indexer.spark.SparkDruidIndexer$$anonfun$2.apply(SparkDruidIndexer.scala:84) at io.druid.indexer.spark.SparkDruidIndexer$$anonfun$2.apply(SparkDruidIndexer.scala:84) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at io.druid.indexer.spark.SparkDruidIndexer$.loadData(SparkDruidIndexer.scala:84) at io.druid.indexer.spark.TestSparkDruidIndexer$$anonfun$1.apply$mcV$sp(TestSparkDruidIndexer.scala:131) at io.druid.indexer.spark.TestSparkDruidIndexer$$anonfun$1.apply(TestSparkDruidIndexer.scala:40) at io.druid.indexer.spark.TestSparkDruidIndexer$$anonfun$1.apply(TestSparkDruidIndexer.scala:40) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683) at org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644) at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656) at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656) at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1683) at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714) at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714) at
[jira] [Created] (SPARK-11016) Spark fails when running with a task that requires a more recent version of RoaringBitmaps
Charles Allen created SPARK-11016: - Summary: Spark fails when running with a task that requires a more recent version of RoaringBitmaps Key: SPARK-11016 URL: https://issues.apache.org/jira/browse/SPARK-11016 Project: Spark Issue Type: Bug Affects Versions: 1.4.0 Reporter: Charles Allen The following error appears during Kryo init whenever a more recent version (>0.5.0) of Roaring bitmaps is required by a job. org/roaringbitmap/RoaringArray$Element was removed in 0.5.0 {code} A needed class was not found. This could be due to an error in your runpath. Missing class: org/roaringbitmap/RoaringArray$Element java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element at org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala:338) at org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237) at org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:222) at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.textFile(SparkContext.scala:816) {code} See https://issues.apache.org/jira/browse/SPARK-5949 for related info -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org