toLocalIterator creates as many jobs as # of partitions, and it ends up spamming Spark UI
Hi all, RDD.toLocalIterator() creates as many jobs as # of partitions and it spams Spark UI especially when the method is used on an RDD with hundreds or thousands of partitions. Does anyone have a way to work around this issue? What do people think about introducing a SparkContext local property (analogous to “spark.scheduler.pool” set as a thread-local property) that determines if the job info should be shown on the Spark UI? Thanks, Mingyu
Re: Spilling when not expected
How did you run the Spark command? Maybe the memory setting didn't actually apply? How much memory does the web ui say is available? BTW - I don't think any JVM can actually handle 700G heap ... (maybe Zing). On Thu, Mar 12, 2015 at 4:09 PM, Tom Hubregtsen wrote: > Hi all, > > I'm running the teraSort benchmark with a relative small input set: 5GB. > During profiling, I can see I am using a total of 68GB. I've got a terabyte > of memory in my system, and set > spark.executor.memory 900g > spark.driver.memory 900g > I use the default for > spark.shuffle.memoryFraction > spark.storage.memoryFraction > I believe that I now have 0.2*900=180GB for shuffle and 0.6*900=540GB for > storage. > > I noticed a lot of variation in runtime (under the same load), and tracked > this down to this function in > core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala > private def spillToPartitionFiles(collection: > SizeTrackingPairCollection[(Int, K), C]): Unit = { > spillToPartitionFiles(collection.iterator) > } > In a slow run, it would loop through this function 12000 times, in a fast > run only 700 times, even though the settings in both runs are the same and > there are no other users on the system. When I look at the function calling > this (insertAll, also in ExternalSorter), I see that spillToPartitionFiles > is only called 700 times in both fast and slow runs, meaning that the > function recursively calls itself very often. Because of the function name, > I assume the system is spilling to disk. As I have sufficient memory, I > assume that I forgot to set a certain memory setting. Anybody any idea > which > other setting I have to set, in order to not spill data in this scenario? > > Thanks, > > Tom > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Spilling-when-not-expected-tp11017.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
May we merge into branch-1.3 at this point?
Is the release certain enough that we can resume merging into branch-1.3 at this point? I have a number of back-ports queued up and didn't want to merge in case another last RC was needed. I see a few commits to the branch though. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Spark config option 'expression language' feedback request
PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to allow for Spark configuration options (whether on command line, environment variable or a configuration file) to be specified via a simple expression language. Such a feature has the following end-user benefits: - Allows for the flexibility in specifying time intervals or byte quantities in appropriate and easy to follow units e.g. 1 week rather rather then 604800 seconds - Allows for the scaling of a configuration option in relation to a system attributes. e.g. SPARK_WORKER_CORES = numCores - 1 SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB - Gives the ability to scale multiple configuration options together eg: spark.driver.memory = 0.75 * physicalMemoryBytes spark.driver.maxResultSize = spark.driver.memory * 0.8 The following functions are currently supported by this PR: NumCores: Number of cores assigned to the JVM (usually == Physical machine cores) PhysicalMemoryBytes: Memory size of hosting machine JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM JVMMaxMemoryBytes:Maximum number of bytes of memory available to the JVM JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes I was wondering if anybody on the mailing list has any further ideas on other functions that could be useful to have when specifying spark configuration options? Regards,Dale.
Re: Using CUDA within Spark / boosting linear algebra
Reyonld, Prof Canny gives me the slides yesterday I will posted the link to the slides to both SF BIg Analytics and SF Machine Learning meetups. Chester Sent from my iPad On Mar 12, 2015, at 22:53, Reynold Xin wrote: > Thanks for chiming in, John. I missed your meetup last night - do you have > any writeups or slides about roofline design? In particular, I'm curious > about what optimizations are available for power-law dense * sparse? (I > don't have any background in optimizations) > > > > On Thu, Mar 12, 2015 at 8:50 PM, jfcanny wrote: > >> If you're contemplating GPU acceleration in Spark, its important to look >> beyond BLAS. Dense BLAS probably account for only 10% of the cycles in the >> datasets we've tested in BIDMach, and we've tried to make them >> representative of industry machine learning workloads. Unless you're >> crunching images or audio, the majority of data will be very sparse and >> power law distributed. You need a good sparse BLAS, and in practice it >> seems >> like you need a sparse BLAS tailored for power-law data. We had to write >> our >> own since the NVIDIA libraries didnt perform well on typical power-law >> data. >> Intel MKL sparse BLAS also have issues and we only use some of them. >> >> You also need 2D reductions, scan operations, slicing, element-wise >> transcendental functions and operators, many kinds of sort, random number >> generators etc, and some kind of memory management strategy. Some of this >> was layered on top of Thrust in BIDMat, but most had to be written from >> scratch. Its all been rooflined, typically to memory throughput of current >> GPUs (around 200 GB/s). >> >> When you have all this you can write Learning Algorithms in the same >> high-level primitives available in Breeze or Numpy/Scipy. Its literally the >> same in BIDMat, since the generic matrix operations are implemented on both >> CPU and GPU, so the same code runs on either platform. >> >> A lesser known fact is that GPUs are around 10x faster for *all* those >> operations, not just dense BLAS. Its mostly due to faster streaming memory >> speeds, but some kernels (random number generation and transcendentals) are >> more than an order of magnitude thanks to some specialized hardware for >> power series on the GPU chip. >> >> When you have all this there is no need to move data back and forth across >> the PCI bus. The CPU only has to pull chunks of data off disk, unpack them, >> and feed them to the available GPUs. Most models fit comfortably in GPU >> memory these days (4-12 GB). With minibatch algorithms you can push TBs of >> data through the GPU this way. >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Using-CUDA-within-Spark-boosting-linear-algebra-tp10481p11021.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spark config option 'expression language' feedback request
I am curious how you are going to support these over mesos and yarn. Any configure change like this should be applicable to all of them, not just local and standalone modes. Regards Mridul On Friday, March 13, 2015, Dale Richardson wrote: > > > > > > > > > > > > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to > allow for Spark configuration options (whether on command line, environment > variable or a configuration file) to be specified via a simple expression > language. > > > Such a feature has the following end-user benefits: > - Allows for the flexibility in specifying time intervals or byte > quantities in appropriate and easy to follow units e.g. 1 week rather > rather then 604800 seconds > > - Allows for the scaling of a configuration option in relation to a system > attributes. e.g. > > SPARK_WORKER_CORES = numCores - 1 > > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB > > - Gives the ability to scale multiple configuration options together eg: > > spark.driver.memory = 0.75 * physicalMemoryBytes > > spark.driver.maxResultSize = spark.driver.memory * 0.8 > > > The following functions are currently supported by this PR: > NumCores: Number of cores assigned to the JVM (usually == > Physical machine cores) > PhysicalMemoryBytes: Memory size of hosting machine > > JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM > > JVMMaxMemoryBytes:Maximum number of bytes of memory available to the > JVM > > JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes > > > I was wondering if anybody on the mailing list has any further ideas on > other functions that could be useful to have when specifying spark > configuration options? > Regards,Dale. >
Re: May we merge into branch-1.3 at this point?
Looks like the release is out: http://spark.apache.org/releases/spark-release-1-3-0.html Though, interestingly, I think we are missing the appropriate v1.3.0 tag: https://github.com/apache/spark/releases Nick On Fri, Mar 13, 2015 at 6:07 AM Sean Owen wrote: > Is the release certain enough that we can resume merging into > branch-1.3 at this point? I have a number of back-ports queued up and > didn't want to merge in case another last RC was needed. I see a few > commits to the branch though. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: May we merge into branch-1.3 at this point?
Yeah, I'm guessing that is all happening quite literally as we speak. The Apache git tag is the one of reference: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc Open season on 1.3 branch then... On Fri, Mar 13, 2015 at 4:20 PM, Nicholas Chammas wrote: > Looks like the release is out: > http://spark.apache.org/releases/spark-release-1-3-0.html > > Though, interestingly, I think we are missing the appropriate v1.3.0 tag: > https://github.com/apache/spark/releases > > Nick > > On Fri, Mar 13, 2015 at 6:07 AM Sean Owen wrote: >> >> Is the release certain enough that we can resume merging into >> branch-1.3 at this point? I have a number of back-ports queued up and >> didn't want to merge in case another last RC was needed. I see a few >> commits to the branch though. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: May we merge into branch-1.3 at this point?
Who is managing 1.3 release ? You might want to coordinate with them before porting changes to branch. Regards Mridul On Friday, March 13, 2015, Sean Owen wrote: > Yeah, I'm guessing that is all happening quite literally as we speak. > The Apache git tag is the one of reference: > > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc > > Open season on 1.3 branch then... > > On Fri, Mar 13, 2015 at 4:20 PM, Nicholas Chammas > > wrote: > > Looks like the release is out: > > http://spark.apache.org/releases/spark-release-1-3-0.html > > > > Though, interestingly, I think we are missing the appropriate v1.3.0 tag: > > https://github.com/apache/spark/releases > > > > Nick > > > > On Fri, Mar 13, 2015 at 6:07 AM Sean Owen > wrote: > >> > >> Is the release certain enough that we can resume merging into > >> branch-1.3 at this point? I have a number of back-ports queued up and > >> didn't want to merge in case another last RC was needed. I see a few > >> commits to the branch though. > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> For additional commands, e-mail: dev-h...@spark.apache.org > > >> > > > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: Spilling when not expected
I use the spark-submit script and the config files in a conf directory. I see the memory settings reflected in the stdout, as well as in the webUI. (it prints all variables from spark-default.conf, and metions I have 540GB free memory available when trying to store a broadcast variable or RDD). I also run "ps -aux | grep java | grep th", which show me that I called java with "-Xms1000g -Xmx1000g" I also tested if these numbers are realistic for the J9 JVM. Outside of Spark, when setting just the initial heapsize (Xms), it gives an error, but if I also define the maximum option with it (Xmx), it seems to us that it is accepting it. Also, in IBM's J9 health center, I see it reserve the 900g, and use up to 68g. Thanks, Tom On 13 March 2015 at 02:05, Reynold Xin wrote: > How did you run the Spark command? Maybe the memory setting didn't > actually apply? How much memory does the web ui say is available? > > BTW - I don't think any JVM can actually handle 700G heap ... (maybe Zing). > > On Thu, Mar 12, 2015 at 4:09 PM, Tom Hubregtsen > wrote: > >> Hi all, >> >> I'm running the teraSort benchmark with a relative small input set: 5GB. >> During profiling, I can see I am using a total of 68GB. I've got a >> terabyte >> of memory in my system, and set >> spark.executor.memory 900g >> spark.driver.memory 900g >> I use the default for >> spark.shuffle.memoryFraction >> spark.storage.memoryFraction >> I believe that I now have 0.2*900=180GB for shuffle and 0.6*900=540GB for >> storage. >> >> I noticed a lot of variation in runtime (under the same load), and tracked >> this down to this function in >> core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala >> private def spillToPartitionFiles(collection: >> SizeTrackingPairCollection[(Int, K), C]): Unit = { >> spillToPartitionFiles(collection.iterator) >> } >> In a slow run, it would loop through this function 12000 times, in a fast >> run only 700 times, even though the settings in both runs are the same and >> there are no other users on the system. When I look at the function >> calling >> this (insertAll, also in ExternalSorter), I see that spillToPartitionFiles >> is only called 700 times in both fast and slow runs, meaning that the >> function recursively calls itself very often. Because of the function >> name, >> I assume the system is spilling to disk. As I have sufficient memory, I >> assume that I forgot to set a certain memory setting. Anybody any idea >> which >> other setting I have to set, in order to not spill data in this scenario? >> >> Thanks, >> >> Tom >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Spilling-when-not-expected-tp11017.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >
Re: Using CUDA within Spark / boosting linear algebra
Hi Reynold, I left Chester with a copy of the slides, so I assume they'll be posted on the SF ML or Big Data sites. We have a draft paper under review. I can ask the co-authors about arxiv'ing it. We have a few heuristics for power-law data. One of them is to keep the feature set sorted by frequency. Power-law data has roughly the same mass in each power-of-two range of feature frequency. By keeping the most frequent features together, you get a lot more value out of the caches on the device (even GPUs have them, albeit smaller ones). e.g. with 100 million features, 1/2 of the feature instances will be in the range 1...,10,000. If they're consecutive they will all hit a fast cache. Another 1/4 will be in 1,...,1,000,000 hitting the next cache etc. Another is to subdivide sparse matrices using the vector of elements rather than rows or columns. Splitting power-law matrices by either rows or columns gives very uneven splits. That means we store sparse matrices in coordinate form rather than compressed row or column format. Other than that, rooflining gives you a goal that you should be able to reach. If you arent at the limit, just knowing that gives you a target to aim at. You can try profiling the kernel to figure out why its slower than it should be. There are a few common reasons (low occupancy, imbalanced thread blocks, thread divergence) that you can discover with the profiler. Then hopefully you can solve them. -John On 3/12/2015 10:56 PM, rxin [via Apache Spark Developers List] wrote: > Thanks for chiming in, John. I missed your meetup last night - do you > have > any writeups or slides about roofline design? In particular, I'm curious > about what optimizations are available for power-law dense * sparse? (I > don't have any background in optimizations) > > > > On Thu, Mar 12, 2015 at 8:50 PM, jfcanny <[hidden email] > > wrote: > > > If you're contemplating GPU acceleration in Spark, its important to > look > > beyond BLAS. Dense BLAS probably account for only 10% of the cycles > in the > > datasets we've tested in BIDMach, and we've tried to make them > > representative of industry machine learning workloads. Unless you're > > crunching images or audio, the majority of data will be very sparse and > > power law distributed. You need a good sparse BLAS, and in practice it > > seems > > like you need a sparse BLAS tailored for power-law data. We had to > write > > our > > own since the NVIDIA libraries didnt perform well on typical power-law > > data. > > Intel MKL sparse BLAS also have issues and we only use some of them. > > > > You also need 2D reductions, scan operations, slicing, element-wise > > transcendental functions and operators, many kinds of sort, random > number > > generators etc, and some kind of memory management strategy. Some of > this > > was layered on top of Thrust in BIDMat, but most had to be written from > > scratch. Its all been rooflined, typically to memory throughput of > current > > GPUs (around 200 GB/s). > > > > When you have all this you can write Learning Algorithms in the same > > high-level primitives available in Breeze or Numpy/Scipy. Its > literally the > > same in BIDMat, since the generic matrix operations are implemented > on both > > CPU and GPU, so the same code runs on either platform. > > > > A lesser known fact is that GPUs are around 10x faster for *all* those > > operations, not just dense BLAS. Its mostly due to faster streaming > memory > > speeds, but some kernels (random number generation and > transcendentals) are > > more than an order of magnitude thanks to some specialized hardware for > > power series on the GPU chip. > > > > When you have all this there is no need to move data back and forth > across > > the PCI bus. The CPU only has to pull chunks of data off disk, > unpack them, > > and feed them to the available GPUs. Most models fit comfortably in GPU > > memory these days (4-12 GB). With minibatch algorithms you can push > TBs of > > data through the GPU this way. > > > > > > > > -- > > View this message in context: > > > http://apache-spark-developers-list.1001551.n3.nabble.com/Using-CUDA-within-Spark-boosting-linear-algebra-tp10481p11021.html > > Sent from the Apache Spark Developers List mailing list archive at > > Nabble.com. > > > > - > > To unsubscribe, e-mail: [hidden email] > > > For additional commands, e-mail: [hidden email] > > > > > > > > > If you reply to this email, your message will be added to the > discussion below: > http://apache-spark-developers-list.1001551.n3.nabble.com/Using-CUDA-within-Spark-boosting-linear-algebra-tp10481p11022.html > > > To unsubscribe from Using CUDA within Spark / boosting linear algebra, > click here >
[ANNOUNCE] Announcing Spark 1.3!
Hi All, I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the fourth release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! Visit the release notes [1] to read about the new features, or download [2] the release today. For errata in the contributions or release notes, please e-mail me *directly* (not on-list). Thanks to everyone who helped work on this release! [1] http://spark.apache.org/releases/spark-release-1-3-0.html [2] http://spark.apache.org/downloads.html - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [ANNOUNCE] Announcing Spark 1.3!
Kudos to the whole team for such a significant achievement! On Fri, Mar 13, 2015 at 10:00 AM, Patrick Wendell wrote: > Hi All, > > I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is > the fourth release on the API-compatible 1.X line. It is Spark's > largest release ever, with contributions from 172 developers and more > than 1,000 commits! > > Visit the release notes [1] to read about the new features, or > download [2] the release today. > > For errata in the contributions or release notes, please e-mail me > *directly* (not on-list). > > Thanks to everyone who helped work on this release! > > [1] http://spark.apache.org/releases/spark-release-1-3-0.html > [2] http://spark.apache.org/downloads.html > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
extended jenkins downtime monday, march 16th, plus some hints at the future
i'll be taking jenkins down for some much-needed plugin updates, as well as potentially upgrading jenkins itself. this will start at 730am PDT, and i'm hoping to have everything up by noon. the move to the anaconda python will take place in the next couple of weeks as i'm in the process of rebuilding my staging environment (much needed) to better reflect production, and allow me to better test the change. and finally, some teasers for what's coming up in the next month or so: * move to a fully puppetized environment (yay no more shell script deployments!) * virtualized workers (including multiple OSes -- OS X, ubuntu, ..., profit?) more details as they come. happy friday! shane
Re: May we merge into branch-1.3 at this point?
Hey Sean, Yes, go crazy. Once we close the release vote, it's open season to merge backports into that release. - Patrick On Fri, Mar 13, 2015 at 9:31 AM, Mridul Muralidharan wrote: > Who is managing 1.3 release ? You might want to coordinate with them before > porting changes to branch. > > Regards > Mridul > > On Friday, March 13, 2015, Sean Owen wrote: > >> Yeah, I'm guessing that is all happening quite literally as we speak. >> The Apache git tag is the one of reference: >> >> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc >> >> Open season on 1.3 branch then... >> >> On Fri, Mar 13, 2015 at 4:20 PM, Nicholas Chammas >> > wrote: >> > Looks like the release is out: >> > http://spark.apache.org/releases/spark-release-1-3-0.html >> > >> > Though, interestingly, I think we are missing the appropriate v1.3.0 tag: >> > https://github.com/apache/spark/releases >> > >> > Nick >> > >> > On Fri, Mar 13, 2015 at 6:07 AM Sean Owen > > wrote: >> >> >> >> Is the release certain enough that we can resume merging into >> >> branch-1.3 at this point? I have a number of back-ports queued up and >> >> didn't want to merge in case another last RC was needed. I see a few >> >> commits to the branch though. >> >> >> >> - >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> >> > >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spark config option 'expression language' feedback request
This is an interesting idea. Are there well known libraries for doing this? Config is the one place where it would be great to have something ridiculously simple, so it is more or less bug free. I'm concerned about the complexity in this patch and subtle bugs that it might introduce to config options that users will have no workarounds. Also I believe it is fairly hard for nice error messages to propagate when using Scala's parser combinator. On Fri, Mar 13, 2015 at 3:07 AM, Dale Richardson wrote: > > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to > allow for Spark configuration options (whether on command line, environment > variable or a configuration file) to be specified via a simple expression > language. > > > Such a feature has the following end-user benefits: > - Allows for the flexibility in specifying time intervals or byte > quantities in appropriate and easy to follow units e.g. 1 week rather > rather then 604800 seconds > > - Allows for the scaling of a configuration option in relation to a system > attributes. e.g. > > SPARK_WORKER_CORES = numCores - 1 > > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB > > - Gives the ability to scale multiple configuration options together eg: > > spark.driver.memory = 0.75 * physicalMemoryBytes > > spark.driver.maxResultSize = spark.driver.memory * 0.8 > > > The following functions are currently supported by this PR: > NumCores: Number of cores assigned to the JVM (usually == > Physical machine cores) > PhysicalMemoryBytes: Memory size of hosting machine > > JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM > > JVMMaxMemoryBytes:Maximum number of bytes of memory available to the > JVM > > JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes > > > I was wondering if anybody on the mailing list has any further ideas on > other functions that could be useful to have when specifying spark > configuration options? > Regards,Dale. >
PR Builder timing out due to ivy cache lock
Looks like something is causing the PR Builder to timeout since this morning with the ivy cache being locked. Any idea what is happening?
Re: PR Builder timing out due to ivy cache lock
link to a build, please? On Fri, Mar 13, 2015 at 11:53 AM, Hari Shreedharan < hshreedha...@cloudera.com> wrote: > Looks like something is causing the PR Builder to timeout since this > morning with the ivy cache being locked. > > Any idea what is happening? >
jenkins httpd being flaky
we just started having issues when visiting jenkins and getting 503 service unavailable errors. i'm on it and will report back with an all-clear.
Re: PR Builder timing out due to ivy cache lock
Here you are: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28571/consoleFull On Fri, Mar 13, 2015 at 11:58 AM, shane knapp wrote: > link to a build, please? > > On Fri, Mar 13, 2015 at 11:53 AM, Hari Shreedharan < > hshreedha...@cloudera.com> wrote: > >> Looks like something is causing the PR Builder to timeout since this >> morning with the ivy cache being locked. >> >> Any idea what is happening? >> > >
Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1
Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315 On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust wrote: > We are looking at the issue and will likely fix it for Spark 1.3.1. > > On Thu, Mar 12, 2015 at 8:25 PM, giive chen wrote: > >> Hi all >> >> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read >> parquet file generated by Spark 1.1. It will cost a lot of migration work >> when we wanna to upgrade Spark 1.3. >> >> Is there anyone can help me? >> >> >> Thanks >> >> Wisely Chen >> >> >> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee wrote: >> >> > Hi, >> > >> > I found that if I try to read parquet file generated by spark 1.1.1 >> using >> > 1.3.0-rc3 by default settings, I got this error: >> > >> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token >> > 'StructType': was expecting ('true', 'false' or 'null') >> > at [Source: StructType(List(StructField(a,IntegerType,false))); line: >> 1, >> > column: 11] >> > at >> > >> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419) >> > at >> > >> > >> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508) >> > at >> > >> > >> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300) >> > at >> > >> > >> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459) >> > at >> > >> > >> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683) >> > at >> > >> > >> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105) >> > at >> > >> > >> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051) >> > at >> > >> > >> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161) >> > at >> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19) >> > at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44) >> > at >> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41) >> > at >> > >> > >> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) >> > at >> > >> > >> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) >> > >> > >> > >> > this is how I save parquet file with 1.1.1: >> > >> > sql("select 1 as a").saveAsParquetFile("/tmp/foo") >> > >> > >> > >> > and this is the meta data of the 1.1.1 parquet file: >> > >> > creator: parquet-mr version 1.4.3 >> > extra: org.apache.spark.sql.parquet.row.metadata = >> > StructType(List(StructField(a,IntegerType,false))) >> > >> > >> > >> > by comparison, this is 1.3.0 meta: >> > >> > creator: parquet-mr version 1.6.0rc3 >> > extra: org.apache.spark.sql.parquet.row.metadata = >> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t >> > [more]... >> > >> > >> > >> > It looks like now ParquetRelation2 is used to load parquet file by >> default >> > and it only recognizes JSON format schema but 1.1.1 schema was case >> class >> > string format. >> > >> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I >> > don't know the differences. >> > Is this considered a bug? We have a lot of parquet files from 1.1.1, >> should >> > we disable data source api in order to read them if we want to upgrade >> to >> > 1.3? >> > >> > Thanks, >> > -- >> > Pei-Lun >> > >> > >
Re: jenkins httpd being flaky
ok we have a few different things happening: 1) httpd on the jenkins master is randomly (though not currently) flaking out and causing visits to the site to return a 503. nothing in the logs shows any problems. 2) there are some github timeouts, which i tracked down and think it's a problem with github themselves (see: https://status.github.com/ and scroll down to 'mean hook delivery time') 3) we have one spark job w/a strange ivy lock issue, that i just retriggered (https://github.com/apache/spark/pull/4964) 4) there's an errant, unkillable pull request builder job ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28574/console ) more updates forthcoming. On Fri, Mar 13, 2015 at 12:04 PM, shane knapp wrote: > we just started having issues when visiting jenkins and getting 503 > service unavailable errors. > > i'm on it and will report back with an all-clear. >
Spark ThriftServer encounter java.lang.IllegalArgumentException: Unknown auth type: null Allowed values are: [auth-int, auth-conf, auth]
When Kerberos is enabled, I get the following exceptions. (Spark 1.2.1 git commit b6eaf77d4332bfb0a698849b1f5f917d20d70e97, Hive 0.13.1, Apache Hadoop 2.4.1) when starting Spark ThriftServer. Command to start thriftserver ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=2 --hiveconf hive.server2.thrift.bind.host=$(hostname) --master yarn-client Error message in spark.log 2015-03-13 18:26:05,363 ERROR org.apache.hive.service.cli.thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(93)) - Error: java.lang.IllegalArgumentException: Unknown auth type: null Allowed values are: [auth-int, auth-conf, auth] at org.apache.hive.service.auth.SaslQOP.fromString(SaslQOP.java:56) at org.apache.hive.service.auth.HiveAuthFactory.getSaslProperties(HiveAuthFactory.java:118) at org.apache.hive.service.auth.HiveAuthFactory.getAuthTransFactory(HiveAuthFactory.java:133) at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:43) at java.lang.Thread.run(Thread.java:744) I'm wondering if this is due to the same problem described in HIVE-8154 HIVE-7620 due to an older code based for the Spark ThriftServer? Any insights are appreciated. Currently, I can't get Spark ThriftServer to run against a Kerberos cluster (Apache 2.4.1). My hive-site.xml looks like the following for spark/conf. hive.semantic.analyzer.factory.impl org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory hive.metastore.execute.setugi true hive.stats.autogather false hive.session.history.enabled true hive.querylog.location /home/hive/log/${user.name} hive.exec.local.scratchdir /tmp/hive/scratch/${user.name} hive.metastore.uris thrift://somehostname:9083 hive.server2.authentication KERBEROS hive.server2.authentication.kerberos.principal *** hive.server2.authentication.kerberos.keytab *** hive.server2.thrift.sasl.qop auth Sasl QOP value; one of 'auth', 'auth-int' and 'auth-conf' hive.server2.enable.impersonation Enable user impersonation for HiveServer2 true hive.metastore.sasl.enabled true hive.metastore.kerberos.keytab.file *** hive.metastore.kerberos.principal *** hive.metastore.cache.pinobjtypes Table,Database,Type,FieldSchema,Order hdfs_sentinel_file *** hive.metastore.warehouse.dir /hive hive.metastore.client.socket.timeout 600 hive.warehouse.subdir.inherit.perms true
Re: jenkins httpd being flaky
i tried a couple of things, but will also be doing a jenkins reboot as soon as the current batch of builds finish. On Fri, Mar 13, 2015 at 12:40 PM, shane knapp wrote: > ok we have a few different things happening: > > 1) httpd on the jenkins master is randomly (though not currently) flaking > out and causing visits to the site to return a 503. nothing in the logs > shows any problems. > > 2) there are some github timeouts, which i tracked down and think it's a > problem with github themselves (see: https://status.github.com/ and > scroll down to 'mean hook delivery time') > > 3) we have one spark job w/a strange ivy lock issue, that i just > retriggered (https://github.com/apache/spark/pull/4964) > > 4) there's an errant, unkillable pull request builder job ( > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28574/console > ) > > more updates forthcoming. > > On Fri, Mar 13, 2015 at 12:04 PM, shane knapp wrote: > >> we just started having issues when visiting jenkins and getting 503 >> service unavailable errors. >> >> i'm on it and will report back with an all-clear. >> > >
Re: jenkins httpd being flaky
ok, things seem to have stabilized... httpd hasn't flaked since ~noon, the hanging PRB job on amp-jenkins-worker-06 was removed w/the restart and things are now building. i cancelled and retriggered a bunch of PRB builds, btw: 4848 (https://github.com/apache/spark/pull/3699) 5922 (https://github.com/apache/spark/pull/4733) 5987 (https://github.com/apache/spark/pull/4986) 6222 (https://github.com/apache/spark/pull/4964) 6325 (https://github.com/apache/spark/pull/5018) as well as: spark-master-maven-with-yarn sorry for the inconvenience... i'm still a little stumped as to what happened, but i think it was a confluence of events (httpd flaking, problems at github, mercury in retrograde, friday thinking it's monday). shane On Fri, Mar 13, 2015 at 1:08 PM, shane knapp wrote: > i tried a couple of things, but will also be doing a jenkins reboot as > soon as the current batch of builds finish. > > > > On Fri, Mar 13, 2015 at 12:40 PM, shane knapp wrote: > >> ok we have a few different things happening: >> >> 1) httpd on the jenkins master is randomly (though not currently) flaking >> out and causing visits to the site to return a 503. nothing in the logs >> shows any problems. >> >> 2) there are some github timeouts, which i tracked down and think it's a >> problem with github themselves (see: https://status.github.com/ and >> scroll down to 'mean hook delivery time') >> >> 3) we have one spark job w/a strange ivy lock issue, that i just >> retriggered (https://github.com/apache/spark/pull/4964) >> >> 4) there's an errant, unkillable pull request builder job ( >> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28574/console >> ) >> >> more updates forthcoming. >> >> On Fri, Mar 13, 2015 at 12:04 PM, shane knapp >> wrote: >> >>> we just started having issues when visiting jenkins and getting 503 >>> service unavailable errors. >>> >>> i'm on it and will report back with an all-clear. >>> >> >> >
Re: PR Builder timing out due to ivy cache lock
i'm thinking that this was something transient, and hopefully won't happen again. a ton of weird stuff happened around the time of this failure (see my flaky httpd email), and this was the only build exhibiting this behavior. i'll keep an eye out for this failure over the weekend... On Fri, Mar 13, 2015 at 12:03 PM, Hari Shreedharan < hshreedha...@cloudera.com> wrote: > Here you are: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28571/consoleFull > > On Fri, Mar 13, 2015 at 11:58 AM, shane knapp wrote: > >> link to a build, please? >> >> On Fri, Mar 13, 2015 at 11:53 AM, Hari Shreedharan < >> hshreedha...@cloudera.com> wrote: >> >>> Looks like something is causing the PR Builder to timeout since this >>> morning with the ivy cache being locked. >>> >>> Any idea what is happening? >>> >> >> >
RE: Spark config option 'expression language' feedback request
Hi Reynold,They are some very good questions. Re: Known libraries There are a number of well known libraries that we could use to implement this features, including MVEL, OGNL and JBOSS EL, or even Spring's EL.I looked at using them to prototype this feature in the beginning, but they all ended up bringing in a lot of code to service a pretty small functional requirement.The prime requirement I was trying to meet was: 1. Be able to specify quantities in kb,mb,gb etc transparently.2. Be able to specify some options as fractions of system attributes eg cpuCores * 0.8 By just implementing this functionality and nothing else I figured I was constraining things enough that end-users got useful functionality but not enough functionality to shoot themselves in the foot in new and interesting ways. I couldn't see a nice way of limiting the expressiveness of 3rd party libraries to this extent. I'd be happy to re-look at the feasibility of pulling in one of the 3rd party libraries if you think this approach has more merit, but I do caution that we may be opening a Pandora's box of potential functionality. Those 3rd party libraries have a lot of (potentially excess) functionality in them. Re: Code ComplexityI wrote the bare minimum code I could come up with to service the above mentioned functionality, and then refactored it to use a stacked traits pattern which increased the code size by about a further 30%. The expression code as it stands is pretty minimal, and has more then 120 unit tests proving its functionality. More then half the code that is there is taken up by utility classes to allow easy reference to byte quantities and time units. The design was deliberately limited to meeting the above requirements and not much more to reduce the chance for other subtleties to raise their heads. Re: Work arounds.It would be pretty simple to implement fall back functionality to disable expression parsing by:1. Globally having a configuration option to disable all expression parsing and fall back to simple java property parsing.2. Locally having a known prefix that disables expression parsing for that option.This should give enough workarounds to keep things running in the unlikely event that something crops up no matter what happens. Re: Error messagesIn regards to your comment about nice error messages I would have to agree with you, it would have been nice. In the end I just return an option[Double] to the calling code for the parsed expression if the entire string is parsed correctly. Given the additional complexity adding error messages involved I retrospectively justify this by saying how much info do you need debug an expression like 'cpuCores * 0.8'? :) Thanks for the feedback. Regards,Dale. > From: r...@databricks.com > Date: Fri, 13 Mar 2015 11:26:44 -0700 > Subject: Re: Spark config option 'expression language' feedback request > To: dale...@hotmail.com > CC: dev@spark.apache.org > > This is an interesting idea. > > Are there well known libraries for doing this? Config is the one place > where it would be great to have something ridiculously simple, so it is > more or less bug free. I'm concerned about the complexity in this patch and > subtle bugs that it might introduce to config options that users will have > no workarounds. Also I believe it is fairly hard for nice error messages to > propagate when using Scala's parser combinator. > > > On Fri, Mar 13, 2015 at 3:07 AM, Dale Richardson > wrote: > > > > > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to > > allow for Spark configuration options (whether on command line, environment > > variable or a configuration file) to be specified via a simple expression > > language. > > > > > > Such a feature has the following end-user benefits: > > - Allows for the flexibility in specifying time intervals or byte > > quantities in appropriate and easy to follow units e.g. 1 week rather > > rather then 604800 seconds > > > > - Allows for the scaling of a configuration option in relation to a system > > attributes. e.g. > > > > SPARK_WORKER_CORES = numCores - 1 > > > > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB > > > > - Gives the ability to scale multiple configuration options together eg: > > > > spark.driver.memory = 0.75 * physicalMemoryBytes > > > > spark.driver.maxResultSize = spark.driver.memory * 0.8 > > > > > > The following functions are currently supported by this PR: > > NumCores: Number of cores assigned to the JVM (usually == > > Physical machine cores) > > PhysicalMemoryBytes: Memory size of hosting machine > > > > JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM > > > > JVMMaxMemoryBytes:Maximum number of bytes of memory available to the > > JVM > > > > JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes > > > > > > I was wondering if anybody on the mailing list has any further ideas on > > other functions that could be useful to have when speci
RE: Spark config option 'expression language' feedback request
Thanks for your questions Mridul. I assume you are referring to how the functionality to query system state works in Yarn and Mesos? The API's used are the standard JVM API's so the functionality will work without change. There is no real use case for using 'physicalMemoryBytes' in these cases though, as the JVM size has already been limited by the resource manager. Regards,Dale. > Date: Fri, 13 Mar 2015 08:20:33 -0700 > Subject: Re: Spark config option 'expression language' feedback request > From: mri...@gmail.com > To: dale...@hotmail.com > CC: dev@spark.apache.org > > I am curious how you are going to support these over mesos and yarn. > Any configure change like this should be applicable to all of them, not > just local and standalone modes. > > Regards > Mridul > > On Friday, March 13, 2015, Dale Richardson wrote: > > > > > > > > > > > > > > > > > > > > > > > > > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to > > allow for Spark configuration options (whether on command line, environment > > variable or a configuration file) to be specified via a simple expression > > language. > > > > > > Such a feature has the following end-user benefits: > > - Allows for the flexibility in specifying time intervals or byte > > quantities in appropriate and easy to follow units e.g. 1 week rather > > rather then 604800 seconds > > > > - Allows for the scaling of a configuration option in relation to a system > > attributes. e.g. > > > > SPARK_WORKER_CORES = numCores - 1 > > > > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB > > > > - Gives the ability to scale multiple configuration options together eg: > > > > spark.driver.memory = 0.75 * physicalMemoryBytes > > > > spark.driver.maxResultSize = spark.driver.memory * 0.8 > > > > > > The following functions are currently supported by this PR: > > NumCores: Number of cores assigned to the JVM (usually == > > Physical machine cores) > > PhysicalMemoryBytes: Memory size of hosting machine > > > > JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM > > > > JVMMaxMemoryBytes:Maximum number of bytes of memory available to the > > JVM > > > > JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes > > > > > > I was wondering if anybody on the mailing list has any further ideas on > > other functions that could be useful to have when specifying spark > > configuration options? > > Regards,Dale. > >
Re: Spark config option 'expression language' feedback request
Let me try to rephrase my query. How can a user specify, for example, what the executor memory should be or number of cores should be. I dont want a situation where some variables can be specified using one set of idioms (from this PR for example) and another set cannot be. Regards, Mridul On Fri, Mar 13, 2015 at 4:06 PM, Dale Richardson wrote: > > > > Thanks for your questions Mridul. > I assume you are referring to how the functionality to query system state > works in Yarn and Mesos? > The API's used are the standard JVM API's so the functionality will work > without change. There is no real use case for using 'physicalMemoryBytes' in > these cases though, as the JVM size has already been limited by the resource > manager. > Regards,Dale. >> Date: Fri, 13 Mar 2015 08:20:33 -0700 >> Subject: Re: Spark config option 'expression language' feedback request >> From: mri...@gmail.com >> To: dale...@hotmail.com >> CC: dev@spark.apache.org >> >> I am curious how you are going to support these over mesos and yarn. >> Any configure change like this should be applicable to all of them, not >> just local and standalone modes. >> >> Regards >> Mridul >> >> On Friday, March 13, 2015, Dale Richardson wrote: >> >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to >> > allow for Spark configuration options (whether on command line, environment >> > variable or a configuration file) to be specified via a simple expression >> > language. >> > >> > >> > Such a feature has the following end-user benefits: >> > - Allows for the flexibility in specifying time intervals or byte >> > quantities in appropriate and easy to follow units e.g. 1 week rather >> > rather then 604800 seconds >> > >> > - Allows for the scaling of a configuration option in relation to a system >> > attributes. e.g. >> > >> > SPARK_WORKER_CORES = numCores - 1 >> > >> > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB >> > >> > - Gives the ability to scale multiple configuration options together eg: >> > >> > spark.driver.memory = 0.75 * physicalMemoryBytes >> > >> > spark.driver.maxResultSize = spark.driver.memory * 0.8 >> > >> > >> > The following functions are currently supported by this PR: >> > NumCores: Number of cores assigned to the JVM (usually == >> > Physical machine cores) >> > PhysicalMemoryBytes: Memory size of hosting machine >> > >> > JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM >> > >> > JVMMaxMemoryBytes:Maximum number of bytes of memory available to the >> > JVM >> > >> > JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes >> > >> > >> > I was wondering if anybody on the mailing list has any further ideas on >> > other functions that could be useful to have when specifying spark >> > configuration options? >> > Regards,Dale. >> > > > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
RE: Spark config option 'expression language' feedback request
Mridul,I may have added some confusion by giving examples in completely different areas. For example the number of cores available for tasking on each worker machine is a resource-controller level configuration variable. In standalone mode (ie using Spark's home-grown resource manager) the configuration variable SPARK_WORKER_CORES is an item that spark admins can set (and we can use expressions for). The equivalent variable for YARN (Yarn.nodemanager.resource.cpu-vcores) is only used by Yarn's node manager setup and is set by Yarn administrators and outside of control of spark (and most users). If you are not a cluster administrator then both variables are irrelevant to you. The same goes for SPARK_WORKER_MEMORY. As for spark.executor.memory, As there is no way to know the attributes of a machine before a task is allocated to it, we cannot use any of the JVMInfo functions. For options like that the expression parser can easily be limited to supporting different byte units of scale (kb/mb/gb etc) and other configuration variables only. Regards,Dale. > Date: Fri, 13 Mar 2015 17:30:51 -0700 > Subject: Re: Spark config option 'expression language' feedback request > From: mri...@gmail.com > To: dale...@hotmail.com > CC: dev@spark.apache.org > > Let me try to rephrase my query. > How can a user specify, for example, what the executor memory should > be or number of cores should be. > > I dont want a situation where some variables can be specified using > one set of idioms (from this PR for example) and another set cannot > be. > > > Regards, > Mridul > > > > > On Fri, Mar 13, 2015 at 4:06 PM, Dale Richardson wrote: > > > > > > > > Thanks for your questions Mridul. > > I assume you are referring to how the functionality to query system state > > works in Yarn and Mesos? > > The API's used are the standard JVM API's so the functionality will work > > without change. There is no real use case for using 'physicalMemoryBytes' > > in these cases though, as the JVM size has already been limited by the > > resource manager. > > Regards,Dale. > >> Date: Fri, 13 Mar 2015 08:20:33 -0700 > >> Subject: Re: Spark config option 'expression language' feedback request > >> From: mri...@gmail.com > >> To: dale...@hotmail.com > >> CC: dev@spark.apache.org > >> > >> I am curious how you are going to support these over mesos and yarn. > >> Any configure change like this should be applicable to all of them, not > >> just local and standalone modes. > >> > >> Regards > >> Mridul > >> > >> On Friday, March 13, 2015, Dale Richardson wrote: > >> > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to > >> > allow for Spark configuration options (whether on command line, > >> > environment > >> > variable or a configuration file) to be specified via a simple expression > >> > language. > >> > > >> > > >> > Such a feature has the following end-user benefits: > >> > - Allows for the flexibility in specifying time intervals or byte > >> > quantities in appropriate and easy to follow units e.g. 1 week rather > >> > rather then 604800 seconds > >> > > >> > - Allows for the scaling of a configuration option in relation to a > >> > system > >> > attributes. e.g. > >> > > >> > SPARK_WORKER_CORES = numCores - 1 > >> > > >> > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB > >> > > >> > - Gives the ability to scale multiple configuration options together eg: > >> > > >> > spark.driver.memory = 0.75 * physicalMemoryBytes > >> > > >> > spark.driver.maxResultSize = spark.driver.memory * 0.8 > >> > > >> > > >> > The following functions are currently supported by this PR: > >> > NumCores: Number of cores assigned to the JVM (usually == > >> > Physical machine cores) > >> > PhysicalMemoryBytes: Memory size of hosting machine > >> > > >> > JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM > >> > > >> > JVMMaxMemoryBytes:Maximum number of bytes of memory available to the > >> > JVM > >> > > >> > JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes > >> > > >> > > >> > I was wondering if anybody on the mailing list has any further ideas on > >> > other functions that could be useful to have when specifying spark > >> > configuration options? > >> > Regards,Dale. > >> > > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >