[spark] branch branch-2.4 updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields

dongjoon Mon, 09 Mar 2020 11:11:58 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new f378c7f  [SPARK-30941][PYSPARK] Add a note to asDict to document its 
behavior when there are duplicate fields
f378c7f is described below

commit f378c7fba29368ca32142a3b7fc169dabe6cb37f
Author: Liang-Chi Hsieh <vii...@gmail.com>
AuthorDate: Mon Mar 9 11:06:45 2020 -0700

    [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when 
there are duplicate fields
    
    ### What changes were proposed in this pull request?
    
    Adding a note to document `Row.asDict` behavior when there are duplicate 
fields.
    
    ### Why are the changes needed?
    
    When a row contains duplicate fields, `asDict` and `_get_item_` behaves 
differently. We should document it to let users know the difference explicitly.
    
    ### Does this PR introduce any user-facing change?
    
    No. Only document change.
    
    ### How was this patch tested?
    
    Existing test.
    
    Closes #27853 from viirya/SPARK-30941.
    
    Authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
    (cherry picked from commit d21aab403a0a32e8b705b38874c0b335e703bd5d)
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 python/pyspark/sql/types.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 1d24c40..0d73963 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1466,6 +1466,12 @@ class Row(tuple):
 
         :param recursive: turns the nested Row as dict (default: False).
 
+        .. note:: If a row contains duplicate field names, e.g., the rows of a 
join
+            between two :class:`DataFrame` that both have the fields of same 
names,
+            one of the duplicate fields will be selected by ``asDict``. 
``__getitem__``
+            will also return one of the duplicate fields, however returned 
value might
+            be different to ``asDict``.
+
         >>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11}
         True
         >>> row = Row(key=1, value=Row(name='a', age=2))


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields

Reply via email to