GitHub user Sephiroth-Lin opened a pull request:

    https://github.com/apache/spark/pull/7209

    [SPARK-8811][SQL] Read array struct data from parquet error

    JIRA:https://issues.apache.org/jira/browse/SPARK-8811
    
    For example:
    we have a table: 
    ```
    t1(c1 string, c2 string, arr_c1 array<struct<in_c1 string, in_c2 string>>, 
arr_c2 array<struct<in_c1 string, in_c2 string>>)
    we save data in parquet.
    for select * from t1, we know in parquet the fileSchema may be:
    message hive_schema {
      optional binary c1;
      optional binary c2;
      optional group arr_c1 (LIST) {
        repeated group bag {
          optional group array_element {
            optional binary IN_C1;
            optional binary IN_C2;
          }
        }
      }
      optional group arr_c2 (LIST) {
        repeated group bag {
          optional group array_element {
            optional binary IN_C1;
            optional binary IN_C2;
          }
        }
      }
    }
    but the requestSchema is:
    message root {
      optional binary c1;
      optional binary c2;
      optional group arr_c1 (LIST) {
        repeated group bag {
          optional group element {
            optional binary IN_C1;
            optional binary IN_C2;
          }
        }
      }
      optional group arr_c2 (LIST) {
        repeated group bag {
          optional group element {
            optional binary IN_C1;
            optional binary IN_C2;
          }
        }
      }
    }
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Sephiroth-Lin/spark SPARK-8811

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7209
    
----
commit ecd25477abd6735514ab48549a4a937bf6d00f42
Author: linweizhong <linweizh...@huawei.com>
Date:   2015-07-03T07:55:00Z

    Change schema for array type from element to array_element

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to