[jira] [Updated] (SPARK-6607) Aggregation attribute name including special chars '(' and ')' should be replaced before generating Parquet schema
[ https://issues.apache.org/jira/browse/SPARK-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-6607: -- Assignee: Liang-Chi Hsieh Aggregation attribute name including special chars '(' and ')' should be replaced before generating Parquet schema -- Key: SPARK-6607 URL: https://issues.apache.org/jira/browse/SPARK-6607 Project: Spark Issue Type: Bug Components: SQL Reporter: Liang-Chi Hsieh Assignee: Liang-Chi Hsieh '(' and ')' are special characters used in Parquet schema for type annotation. When we run an aggregation query, we will obtain attribute name such as MAX(a). If we directly store the generated DataFrame as Parquet file, it causes failure when reading and parsing the stored schema string. Several methods can be adopted to solve this. This pr uses a simplest one to just replace attribute names before generating Parquet schema based on these attributes. Another possible method might be modifying all aggregation expression names from func(column) to func[column]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6607) Aggregation attribute name including special chars '(' and ')' should be replaced before generating Parquet schema
[ https://issues.apache.org/jira/browse/SPARK-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-6607: -- Target Version/s: 1.4.0 Affects Version/s: 1.1.1 1.2.1 1.3.0 Aggregation attribute name including special chars '(' and ')' should be replaced before generating Parquet schema -- Key: SPARK-6607 URL: https://issues.apache.org/jira/browse/SPARK-6607 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.1, 1.2.1, 1.3.0 Reporter: Liang-Chi Hsieh Assignee: Liang-Chi Hsieh '(' and ')' are special characters used in Parquet schema for type annotation. When we run an aggregation query, we will obtain attribute name such as MAX(a). If we directly store the generated DataFrame as Parquet file, it causes failure when reading and parsing the stored schema string. Several methods can be adopted to solve this. This pr uses a simplest one to just replace attribute names before generating Parquet schema based on these attributes. Another possible method might be modifying all aggregation expression names from func(column) to func[column]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org