[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

michalsenkyr Sun, 04 Jun 2017 06:24:25 -0700

Github user michalsenkyr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16986#discussion_r120009967
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
    @@ -652,6 +653,299 @@ case class MapObjects private(
       }
     }
     
    +object CollectObjectsToMap {
    +  private val curId = new java.util.concurrent.atomic.AtomicInteger()
    +
    +  /**
    +   * Construct an instance of CollectObjects case class.
    +   *
    +   * @param keyFunction The function applied on the key collection 
elements.
    +   * @param keyInputData An expression that when evaluated returns a key 
collection object.
    +   * @param keyElementType The data type of key elements in the collection.
    +   * @param valueFunction The function applied on the value collection 
elements.
    +   * @param valueInputData An expression that when evaluated returns a 
value collection object.
    +   * @param valueElementType The data type of value elements in the 
collection.
    +   * @param collClass The type of the resulting collection.
    +   */
    +  def apply(
    +      keyFunction: Expression => Expression,
    +      keyInputData: Expression,
    +      keyElementType: DataType,
    +      valueFunction: Expression => Expression,
    +      valueInputData: Expression,
    +      valueElementType: DataType,
    +      collClass: Class[_]): CollectObjectsToMap = {
    +    val id = curId.getAndIncrement()
    +    val keyLoopValue = s"CollectObjectsToMap_keyLoopValue$id"
    +    val keyLoopIsNull = s"CollectObjectsToMap_keyLoopIsNull$id"
    --- End diff --
    
    Yes. A key in `MapData` cannot be null. However, since the function takes 
two `ArrayData`s as input, I figured that we shouldn't count on this 
requirement being necessarily fulfilled. As `CollectObjectsToMap` is a class 
separate from its usage in `ScalaReflection`, I tried to make it as generic and 
as similar to `MapObjects` as possible, so it can be used elsewhere without 
having to make sure additional preconditions are met.
    It also produces a generic `Map` which has implementations that can support 
null keys. Right now, the only check that prevents this is 
[here](https://github.com/apache/spark/pull/16986/files/7af9b0625a245d34943b7192532c93f2aafac635#diff-e436c96ea839dfe446837ab2a3531f93R933).
 If there is ever a need to support these kinds of `Map`s in the future, this 
should make the job easier.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

Reply via email to