[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945755#comment-16945755 ] Nicolas Laduguie commented on SPARK-28558: -- Hi everybody, any news about a resolution of this issue ? > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > Attachments: image-2019-09-24-09-20-07-225.png > > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936719#comment-16936719 ] Nicolas Laduguie commented on SPARK-28558: -- I debugged Spark and my understanding is that : * When you "partitionBy" a data set in Spark, it will generate a path in which to store the file (parquet in my case) including the partitions folders, * When Spark save the file, it saves directly the file and not the intermediate partitions folders if they don't exist, * The permissions are calculated for the file (so 666 by default) and the umask is applied on this permissions (for example 664 if umask is 002), * The file is saved with those permissions, and the partitions folders are also saved with those permissions. This is probably why we can't apply the right permissions on partitions folders. > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > Attachments: image-2019-09-24-09-20-07-225.png > > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936504#comment-16936504 ] Nicolas Laduguie commented on SPARK-28558: -- FYI here is a screenshot showing the permissions of partition folders names "dt=*" which are 766 and the table folder permissions which are 755. !image-2019-09-24-09-20-07-225.png! > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > Attachments: image-2019-09-24-09-20-07-225.png > > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936485#comment-16936485 ] Nicolas Laduguie commented on SPARK-28558: -- Hi [~spearson] [~holden] we are using MapR 6.0.1 (Hadoop 2.7.4) and the workaround you suggested (spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2) didn't work for me. In fact I can change a little bit the permissions of the partitions folders thanks to the "spark.hadoop.fs.permissions.umask-mode" configuration but it only impacts the R and W perms but not the X perm. To describe a little bit more my situation, the table folder is created well by our Spark Structured Streaming job, that is to say with the correct perms (775). The problem is located on the partition folders that don't receive the correct X perm. They always stay on 764 or 766. > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936349#comment-16936349 ] Stephen Pearson commented on SPARK-28558: - [~holden] I am using MapR 5.1.0 [~nladuguie] have you tried setting the config below? It allowed for a work around for us (assuming the behaviour change doesn't impact your processes). spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936200#comment-16936200 ] holdenk commented on SPARK-28558: - What storage system are y'all using [~nladuguie] & [~spearson] ? > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935848#comment-16935848 ] Nicolas Laduguie commented on SPARK-28558: -- Hi, I also face this issue. >From my point of view, it is not only a "minor" bug because it is modifying >partition dirs permissions, which is causing users not to be able to access >data. The priority of this issue should be leveled up. > DatasetWriter partitionBy is changing the group file permissions in 2.4 for > parquets > > > Key: SPARK-28558 > URL: https://issues.apache.org/jira/browse/SPARK-28558 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 > Environment: Hadoop 2.7 > Scala 2.11 > Tested: > * Spark 2.3.3 - Works > * Spark 2.4.x - All have the same issue >Reporter: Stephen Pearson >Priority: Minor > > When writing a parquet using partitionBy the group file permissions are being > changed as shown below. This causes members of the group to get > "org.apache.hadoop.security.AccessControlException: Open failed for file > error: Permission denied (13)" > This worked in 2.3. I found a workaround which was to set > "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives > the correct behaviour > > Code I used to reproduce issue: > {quote}Seq(("H", 1), ("I", 2)) > .toDF("Letter", "Number") > .write > .partitionBy("Letter") > .parquet(...){quote} > > {quote}sparktesting$ tree -dp > ├── [drwxrws---] letter_testing2.3-defaults > │ ├── [drwxrws---] Letter=H > │ └── [drwxrws---] Letter=I > ├── [drwxrws---] letter_testing2.4-defaults > │ ├── [drwxrwS---] Letter=H > │ └── [drwxrwS---] Letter=I > └── [drwxrws---] letter_testing2.4-file-writer2 > ├── [drwxrws---] Letter=H > └── [drwxrws---] Letter=I > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org