[GitHub] spark pull request: [SPARK-4666] Improve YarnAllocator's parsing o...

ryan-williams Tue, 10 Feb 2015 13:01:15 -0800

Github user ryan-williams commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3525#discussion_r24449997
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -980,33 +980,77 @@ private[spark] object Utils extends Logging {
         )
       }
     
    +
    +  private val TB = 1L << 40
    +  private val GB = 1L << 30
    +  private val MB = 1L << 20
    +  private val KB = 1L << 10
    +
    +  private val scaleCharToFactor: Map[Char, Long] = Map(
    +    'b' -> 1L,
    +    'k' -> KB,
    +    'm' -> MB,
    +    'g' -> GB,
    +    't' -> TB
    +  )
    +
       /**
    -   * Convert a Java memory parameter passed to -Xmx (such as 300m or 1g) 
to a number of megabytes.
    -   */
    -  def memoryStringToMb(str: String): Int = {
    +   * Convert a Java memory parameter passed to -Xmx (such as "300m" or 
"1g") to a number of
    +   * megabytes (or other byte-scale denominations as specified by 
@outputScaleChar).
    +   *
    +   * For @defaultInputScaleChar and @outputScaleChar, valid values are: 
'b' (bytes), 'k'
    +   * (kilobytes), 'm' (megabytes), 'g' (gigabytes), and 't' (terabytes).
    +   *
    +   * @param str String to parse an amount of memory out of
    +   * @param defaultInputScaleChar if no "scale" is provided on the end of 
@str (i.e. @str is a
    +   *                              plain numeric value), assume this scale 
(default: 'b' for
    +   *                              'bytes')
    +   * @param outputScaleChar express the output in this scale, i.e. number 
of bytes, kilobytes,
    +   *                        megabytes, or gigabytes.
    +   */
    +  def parseMemoryString(
    +      str: String,
    +      defaultInputScaleChar: Char = 'b',
    +      outputScaleChar: Char = 'm'): Long = {
    +
         val lower = str.toLowerCase
    -    if (lower.endsWith("k")) {
    -      (lower.substring(0, lower.length-1).toLong / 1024).toInt
    -    } else if (lower.endsWith("m")) {
    -      lower.substring(0, lower.length-1).toInt
    -    } else if (lower.endsWith("g")) {
    -      lower.substring(0, lower.length-1).toInt * 1024
    -    } else if (lower.endsWith("t")) {
    -      lower.substring(0, lower.length-1).toInt * 1024 * 1024
    -    } else {// no suffix, so it's just a number in bytes
    -      (lower.toLong / 1024 / 1024).toInt
    -    }
    +    val lastChar = lower(lower.length - 1)
    +    val (num, inputScaleChar) =
    +      if (lastChar.isDigit) {
    +        (lower.toLong, defaultInputScaleChar)
    +      } else {
    +        (lower.substring(0, lower.length - 1).toLong, lastChar)
    +      }
    +
    +    (for {
    --- End diff --
    
    > Why is a for construction used here? just to handle the invalid in / out 
scale param?
    
    `for` comprehensions are commonly used when `map`ing over 2 or more monads 
to avoid arguably-ugly `.flatMap`-chaining; the syntax just removes some 
boilerplate, e.g.:
    
    ```
        scaleCharToFactor.get(inputScaleChar).flatMap(inputScale => 
          scaleCharToFactor.get(outputScaleChar).map(outputScale =>
            inputScale * num / outputScale
          )
        ).getOrElse(
            // throw
          )
    ```
    vs.
    
    ```
        (for {
          inputScale <- scaleCharToFactor.get(inputScaleChar)
          outputScale <- scaleCharToFactor.get(outputScaleChar)
        } yield {
          inputScale * num / outputScale
        }).getOrElse(
            // throw
          )
    ```
    
    (I collapsed the `scale` wrapper line in the latter for apples-to-apples 
brevity, and can do that in the PR as well).
    
    So, it's not a "looping construct" so much as a "mapping" one, commonly 
used on `Option`s, `List`s, and [even things like twitter 
`Future`s](https://twitter.github.io/scala_school/finagle.html) (search for 
"for {"). 
    
    However, it does get better as the number of chained `map`s increases, e.g. 
especially when there are 3 or more, so I'm not that tied to it here.
    
    Of course, handling such situations using a `match` is also possible:
    
    ```
        (scaleCharToFactor.get(inputScaleChar), 
scaleCharToFactor.get(outputScaleChar)) match {
          case (Some(inputScale), Some(outputScale)) =>
            inputScale * num / outputScale
          case _ =>
            // throw
        }
    ```
    
    I prefer all of these to, say, the following straw-man that doesn't take 
advantage of any of the nice things that come from using `Option`s:
    
    ```
        if (scaleCharToFactor.get(inputScaleChar).isDefined && 
            scaleCharToFactor.get(outputScaleChar).isDefined)
          scaleCharToFactor.get(inputScaleChar).get * num / 
scaleCharToFactor.get(outputScaleChar).get
        else
          // throw
    ```
    
    but I'll defer to you on the approach you like best.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4666] Improve YarnAllocator's parsing o...

Reply via email to