Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18853#discussion_r140802924
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1460,6 +1460,13 @@ that these options will be deprecated in future 
release as more optimizations ar
           Configures the number of partitions to use when shuffling data for 
joins or aggregations.
         </td>
       </tr>
    +  <tr>
    +    <td><code>spark.sql.typeCoercion.mode</code></td>
    +    <td><code>default</code></td>
    +    <td>
    +        Whether compatible with Hive. Available options are 
<code>default</code> and <code>hive</code>.
    --- End diff --
    
    This description feels inadequate to me.  I think most users will think 
"hive" means "old, legacy way of doing things and "default" means "new, better, 
spark way of doing things".  But I haven't heard an argument in favor of the 
"default" behavior, just that we don't want to have a breaking change of 
behavior.
    
    So (a) I'd advocate that we rename "default" to "legacy", or something else 
along those lines.  I do think it should be the default value, to avoid 
changing behavior.
    and (b) I think the doc section here should more clearly indicate the 
difference, eg. "The 'legacy' typeCoercion mode was used in spark prior to 2.3, 
and so it continues to be the default to avoid breaking behavior.  However, it 
has logical inconsistencies.  The 'hive' mode is preferred for most new 
applications, though it may require additional manual casting.
    
    I am even wondering if we should have a 3rd option, to not implicit cast 
across type categories, eg. like postgres, as this avoids nasty surprises for 
the user.  While the casts are convenient, when it doesn't work there is very 
little indication to the user that anything went wrong -- most likely they'll 
just keep continue processing data though the results don't actually have the 
semantics they want.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to