Re: wot no toggle ?
You may want to read about the JVM and have some degree of understanding what you're talking about, and then you'd know that those options have different meanings. You can view both at the same time, for example. On Thu, Apr 16, 2020, 2:13 AM jane thorpe wrote: > https://spark.apache.org/docs/3.0.0-preview/web-ui.html#storage-tab > > On the link in one of the screen shot there are two checkboxes. > ON HEAP MEMORY > OFF HEAP MEMORY. > > That is as useful as a pussy on as Barry Humphries wearing a gold dress > as Dame Edna average. > > Which monkey came up with that ? > None of the moneys here noticed that ? > > Ever heard of a toggle switch. > look behind you. > Look at the light switch. > That is a Toggle switch ON/OFF. > > > Jane thorpe > janethor...@aol.com >
Re: Going it alone.
> > I want to know if Spark is headed in my direction. > You are implying Spark could be. What direction are you headed in, exactly? I don't feel as if anything were implied when you were asked for use cases or what problem you are solving. You were asked to identify some use cases, of which you don't appear to have any. On Tue, Apr 14, 2020 at 4:49 PM jane thorpe wrote: > That's what I want to know, Use Cases. > I am looking for direction as I described and I want to know if Spark is > headed in my direction. > > You are implying Spark could be. > > So tell me about the USE CASES and I'll do the rest. > -- > On Tuesday, 14 April 2020 yeikel valdes wrote: > It depends on your use case. What are you trying to solve? > > > On Tue, 14 Apr 2020 15:36:50 -0400 * janethor...@aol.com.INVALID * > wrote > > Hi, > > I consider myself to be quite good in Software Development especially > using frameworks. > > I like to get my hands dirty. I have spent the last few months > understanding modern frameworks and architectures. > > I am looking to invest my energy in a product where I don't have to > relying on the monkeys which occupy this space we call software > development. > > I have found one that meets my requirements. > > Would Apache Spark be a good Tool for me or do I need to be a member of a > team to develop products using Apache Spark ? > > > > > >
Spark event logging with s3a
We are trying to use spark event logging with s3a as a destination for event data. We added these settings to the spark submits: spark.eventLog.dir s3a://ourbucket/sparkHistoryServer/eventLogs spark.eventLog.enabled true Everything works fine with smaller jobs, and we can see the history data in the history server that’s also using s3a. However, when we tried a job with a few hundred gigs of data that goes through multiple stages, it was dying with OOM exception (same job works fine with spark.eventLog.enabled false) 18/10/22 23:07:22 ERROR util.Utils: uncaught error in thread SparkListenerBus, stopping SparkContext java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) Full stack trace: https://gist.github.com/davidhesson/bd64a25f04c6bb241ec398f5383d671c Does anyone have any insight or experience with using spark history server with s3a? Is this problem being caused by perhaps something else in our configs? Any help would be appreciated.
Seeing a framework registration loop with Spark 2.3.1 on DCOS 1.10.0
I’m attempting to use Spark 2.3.1 (spark-2.3.1-bin-hadoop2.7.tgz) in cluster mode and running into some issues. This is a cluster where we've had success using Spark 2.2.0 (spark-2.2.0-bin-hadoop2.7.tgz), and I'm simply upgrading our nodes with the new Spark 2.3.1 package and testing it out. Some version information: Spark v2.3.1 DC/OS v1.10.0 Mesos v1.4.0 Dispatcher: docker, mesosphere/spark:2.3.1-2.2.1-2-hadoop-2.6 (Docker image from https://github.com/mesosphere/spark-build) This is a multi-node cluster. I'm submitting a job that's using the sample spark-pi jar included in the distribution. Occasionally, spark submits run without issue. Then a run will begin execution where a bunch of TASK_LOST messages occur immediately, followed by the BlockManager attempting to remove a handful of non-existent executors. I also can see where the driver/scheduler begins making a tight loop of SUBSCRIBE requests to the master.mesos service. The request volume and frequency is so high that the mesos.master stops responding to other requests, and eventually runs OOM and systemd restarts the failed process. If there is only one job running, and it's able to start an executor (exactly one started in my sample logs), the job will eventually complete. However, if I deploy multiple jobs (five seemed to do the trick), I've seen cases where none of the jobs complete, and the cluster begins to have cascading failures due to the master not servicing other API requests due to the influx of REGISTER requests from numerous spark driver frameworks. Logs: Problematic run (stdout, stderr, mesos.master logs): https://gist.github.com/davidhesson/791cb3101db2521a51478ff4e2d22841 Successful run (stdout, stderr; for comparison): https://gist.github.com/davidhesson/66e32196834b849cd2919dba8275cd4a Snippet of flood of subscribes hitting master node: https://gist.github.com/davidhesson/2c5d22e4f87fad85ce975bc074289136 Spark submit JSON: https://gist.github.com/davidhesson/c0c77dffe48965650fd5bbb078731900