[jira] [Commented] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets

2019-09-23 Thread Stephen Pearson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936349#comment-16936349
 ] 

Stephen Pearson commented on SPARK-28558:
-

[~holden] I am using MapR 5.1.0

 

[~nladuguie] have you tried setting the config below? It allowed for a work 
around for us (assuming the behaviour change doesn't impact your processes).

spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2

> DatasetWriter partitionBy is changing the group file permissions in 2.4 for 
> parquets
> 
>
> Key: SPARK-28558
> URL: https://issues.apache.org/jira/browse/SPARK-28558
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Hadoop 2.7
> Scala 2.11
> Tested:
>  * Spark 2.3.3 - Works
>  * Spark 2.4.x - All have the same issue
>Reporter: Stephen Pearson
>Priority: Minor
>
> When writing a parquet using partitionBy the group file permissions are being 
> changed as shown below. This causes members of the group to get 
> "org.apache.hadoop.security.AccessControlException: Open failed for file 
> error: Permission denied (13)"
> This worked in 2.3. I found a workaround which was to set 
> "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives 
> the correct behaviour
>  
> Code I used to reproduce issue:
> {quote}Seq(("H", 1), ("I", 2))
>  .toDF("Letter", "Number")
>  .write
>  .partitionBy("Letter")
>  .parquet(...){quote}
>  
> {quote}sparktesting$ tree -dp
> ├── [drwxrws---]  letter_testing2.3-defaults
> │   ├── [drwxrws---]  Letter=H
> │   └── [drwxrws---]  Letter=I
> ├── [drwxrws---]  letter_testing2.4-defaults
> │   ├── [drwxrwS---]  Letter=H
> │   └── [drwxrwS---]  Letter=I
> └── [drwxrws---]  letter_testing2.4-file-writer2
>     ├── [drwxrws---]  Letter=H
>     └── [drwxrws---]  Letter=I
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets

2019-07-29 Thread Stephen Pearson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Pearson updated SPARK-28558:

Description: 
When writing a parquet using partitionBy the group file permissions are being 
changed as shown below. This causes members of the group to get 
"org.apache.hadoop.security.AccessControlException: Open failed for file 
error: Permission denied (13)"

This worked in 2.3. I found a workaround which was to set 
"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives 
the correct behaviour

 

Code I used to reproduce issue:
{quote}Seq(("H", 1), ("I", 2))
 .toDF("Letter", "Number")
 .write
 .partitionBy("Letter")
 .parquet(...){quote}
 
{quote}sparktesting$ tree -dp

├── [drwxrws---]  letter_testing2.3-defaults

│   ├── [drwxrws---]  Letter=H

│   └── [drwxrws---]  Letter=I

├── [drwxrws---]  letter_testing2.4-defaults

│   ├── [drwxrwS---]  Letter=H

│   └── [drwxrwS---]  Letter=I

└── [drwxrws---]  letter_testing2.4-file-writer2

    ├── [drwxrws---]  Letter=H

    └── [drwxrws---]  Letter=I
{quote}

  was:
When writing a parquet using partitionBy the group file permissions are being 
changed as shown below. This causes members of the group to get 
"org.apache.hadoop.security.AccessControlException: Open failed for file 
error: Permission denied (13)"

This worked in 2.3. I found a workaround which was to set 
"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives 
the correct behaviour

 

sparktesting$ tree -dp

├── [drwxrws---]  letter_testing2.3-defaults

│   ├── [drwxrws---]  Letter=H

│   └── [drwxrws---]  Letter=I

├── [drwxrws---]  letter_testing2.4-defaults

│   ├── [drwxrwS---]  Letter=H

│   └── [drwxrwS---]  Letter=I

└── [drwxrws---]  letter_testing2.4-file-writer2

    ├── [drwxrws---]  Letter=H

    └── [drwxrws---]  Letter=I


> DatasetWriter partitionBy is changing the group file permissions in 2.4 for 
> parquets
> 
>
> Key: SPARK-28558
> URL: https://issues.apache.org/jira/browse/SPARK-28558
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Hadoop 2.7
> Scala 2.11
> Tested:
>  * Spark 2.3.3 - Works
>  * Spark 2.4.x - All have the same issue
>Reporter: Stephen Pearson
>Priority: Minor
>
> When writing a parquet using partitionBy the group file permissions are being 
> changed as shown below. This causes members of the group to get 
> "org.apache.hadoop.security.AccessControlException: Open failed for file 
> error: Permission denied (13)"
> This worked in 2.3. I found a workaround which was to set 
> "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives 
> the correct behaviour
>  
> Code I used to reproduce issue:
> {quote}Seq(("H", 1), ("I", 2))
>  .toDF("Letter", "Number")
>  .write
>  .partitionBy("Letter")
>  .parquet(...){quote}
>  
> {quote}sparktesting$ tree -dp
> ├── [drwxrws---]  letter_testing2.3-defaults
> │   ├── [drwxrws---]  Letter=H
> │   └── [drwxrws---]  Letter=I
> ├── [drwxrws---]  letter_testing2.4-defaults
> │   ├── [drwxrwS---]  Letter=H
> │   └── [drwxrwS---]  Letter=I
> └── [drwxrws---]  letter_testing2.4-file-writer2
>     ├── [drwxrws---]  Letter=H
>     └── [drwxrws---]  Letter=I
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28558) DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets

2019-07-29 Thread Stephen Pearson (JIRA)
Stephen Pearson created SPARK-28558:
---

 Summary: DatasetWriter partitionBy is changing the group file 
permissions in 2.4 for parquets
 Key: SPARK-28558
 URL: https://issues.apache.org/jira/browse/SPARK-28558
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.3
 Environment: Hadoop 2.7

Scala 2.11

Tested:
 * Spark 2.3.3 - Works
 * Spark 2.4.x - All have the same issue
Reporter: Stephen Pearson


When writing a parquet using partitionBy the group file permissions are being 
changed as shown below. This causes members of the group to get 
"org.apache.hadoop.security.AccessControlException: Open failed for file 
error: Permission denied (13)"

This worked in 2.3. I found a workaround which was to set 
"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives 
the correct behaviour

 

sparktesting$ tree -dp

├── [drwxrws---]  letter_testing2.3-defaults

│   ├── [drwxrws---]  Letter=H

│   └── [drwxrws---]  Letter=I

├── [drwxrws---]  letter_testing2.4-defaults

│   ├── [drwxrwS---]  Letter=H

│   └── [drwxrwS---]  Letter=I

└── [drwxrws---]  letter_testing2.4-file-writer2

    ├── [drwxrws---]  Letter=H

    └── [drwxrws---]  Letter=I



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org