[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936349#comment-16936349 ] Stephen Pearson commented on SPARK-28558: - [~holden] I am using MapR 5.1.0 [~nladuguie] have you tried setting the config below? It allowed for a work around for us (assuming the behaviour change doesn't impact your processes). spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Pearson updated SPARK-28558: Description: When writing a parquet using partitionBy the group file permissions are being changed as shown below. This causes members of the group to get "org.apache.hadoop.security.AccessControlException: Open failed for file error: Permission denied (13)" This worked in 2.3. I found a workaround which was to set "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives the correct behaviour Code I used to reproduce issue: {quote}Seq(("H", 1), ("I", 2)) .toDF("Letter", "Number") .write .partitionBy("Letter") .parquet(...){quote} {quote}sparktesting$ tree -dp ├── [drwxrws---] letter_testing2.3-defaults │ ├── [drwxrws---] Letter=H │ └── [drwxrws---] Letter=I ├── [drwxrws---] letter_testing2.4-defaults │ ├── [drwxrwS---] Letter=H │ └── [drwxrwS---] Letter=I └── [drwxrws---] letter_testing2.4-file-writer2 ├── [drwxrws---] Letter=H └── [drwxrws---] Letter=I {quote} was: When writing a parquet using partitionBy the group file permissions are being changed as shown below. This causes members of the group to get "org.apache.hadoop.security.AccessControlException: Open failed for file error: Permission denied (13)" This worked in 2.3. I found a workaround which was to set "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives the correct behaviour sparktesting$ tree -dp ├── [drwxrws---] letter_testing2.3-defaults │ ├── [drwxrws---] Letter=H │ └── [drwxrws---] Letter=I ├── [drwxrws---] letter_testing2.4-defaults │ ├── [drwxrwS---] Letter=H │ └── [drwxrwS---] Letter=I └── [drwxrws---] letter_testing2.4-file-writer2 ├── [drwxrws---] Letter=H └── [drwxrws---] Letter=I > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
Stephen Pearson created SPARK-28558: --- Summary: DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets Key: SPARK-28558 URL: https://issues.apache.org/jira/browse/SPARK-28558 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.3 Environment: Hadoop 2.7 Scala 2.11 Tested: * Spark 2.3.3 - Works * Spark 2.4.x - All have the same issue Reporter: Stephen Pearson When writing a parquet using partitionBy the group file permissions are being changed as shown below. This causes members of the group to get "org.apache.hadoop.security.AccessControlException: Open failed for file error: Permission denied (13)" This worked in 2.3. I found a workaround which was to set "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives the correct behaviour sparktesting$ tree -dp ├── [drwxrws---] letter_testing2.3-defaults │ ├── [drwxrws---] Letter=H │ └── [drwxrws---] Letter=I ├── [drwxrws---] letter_testing2.4-defaults │ ├── [drwxrwS---] Letter=H │ └── [drwxrwS---] Letter=I └── [drwxrws---] letter_testing2.4-file-writer2 ├── [drwxrws---] Letter=H └── [drwxrws---] Letter=I -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org