Huon Wilson created SPARK-27278:
-----------------------------------

             Summary: SimplifyExtractValueOps doesn't optimise Literal map 
access to CASE WHEN ... END
                 Key: SPARK-27278
                 URL: https://issues.apache.org/jira/browse/SPARK-27278
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
         Environment: Spark 2.4.0
            Reporter: Huon Wilson


With a map that isn't constant-foldable, spark will optimise an access to a 
series of {{CASE WHEN ... THEN ... WHEN ... THEN ... END}}, for instance

{code:none}
scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), 'id)('id) as 
"x").explain
== Physical Plan ==
*(1) Project [CASE WHEN (cast(id#180L as int) = 1) THEN 1 WHEN (cast(id#180L as 
int) = 2) THEN id#180L END AS x#182L]
+- *(1) Range (0, 1000, step=1, splits=12)
{code}

This results in an efficient series of ifs and elses, in the code generation:

{code:java}
/* 037 */       boolean project_isNull_3 = false;
/* 038 */       int project_value_3 = -1;
/* 039 */       if (!false) {
/* 040 */         project_value_3 = (int) project_expr_0_0;
/* 041 */       }
/* 042 */
/* 043 */       boolean project_value_2 = false;
/* 044 */       project_value_2 = project_value_3 == 1;
/* 045 */       if (!false && project_value_2) {
/* 046 */         project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 047 */         project_project_value_1_0 = 1L;
/* 048 */         continue;
/* 049 */       }
/* 050 */
/* 051 */       boolean project_isNull_8 = false;
/* 052 */       int project_value_8 = -1;
/* 053 */       if (!false) {
/* 054 */         project_value_8 = (int) project_expr_0_0;
/* 055 */       }
/* 056 */
/* 057 */       boolean project_value_7 = false;
/* 058 */       project_value_7 = project_value_8 == 2;
/* 059 */       if (!false && project_value_7) {
/* 060 */         project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
/* 061 */         project_project_value_1_0 = project_expr_0_0;
/* 062 */         continue;
/* 063 */       }
{code}

If the map can be constant folded, the constant folding happens first, and the 
{{SimplifyExtractValueOps}} optimisation doesn't trigger, resulting doing a map 
traversal and more dynamic checks:

{code:none}
scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), lit(2))('id) as 
"x").explain
== Physical Plan ==
*(1) Project [keys: [1,2], values: [1,2][cast(id#195L as int)] AS x#197]
+- *(1) Range (0, 1000, step=1, splits=12)
{code}

The {{keys: ..., values: ...}} is from the {{ArrayBasedMapData}} type, which is 
what is stored in the {{Literal}} form of the {{map(...)}} expression in that 
select. The code generated is less efficient, since it has to do a manual 
dynamic traversal of the map's array of keys, with type casts etc.:

{code:java}
/* 099 */           int project_index_0 = 0;
/* 100 */           boolean project_found_0 = false;
/* 101 */           while (project_index_0 < project_length_0 && 
!project_found_0) {
/* 102 */             final int project_key_0 = 
project_keys_0.getInt(project_index_0);
/* 103 */             if (project_key_0 == project_value_2) {
/* 104 */               project_found_0 = true;
/* 105 */             } else {
/* 106 */               project_index_0++;
/* 107 */             }
/* 108 */           }
/* 109 */
/* 110 */           if (!project_found_0) {
/* 111 */             project_isNull_0 = true;
/* 112 */           } else {
/* 113 */             project_value_0 = 
project_values_0.getInt(project_index_0);
/* 114 */           }
{code}

It looks like the problem is in {{SimplifyExtractValueOps}}, which doesn't 
handle {{GetMapValue(Literal(...), key)}}, only the {{CreateMap}} form:

{code:scala}
      case GetMapValue(CreateMap(elems), key) => CaseKeyWhen(key, elems)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to