Aseem Bansal created SPARK-17307:
------------------------------------

             Summary: Document what all access is needed on S3 bucket when 
trying to save a model
                 Key: SPARK-17307
                 URL: https://issues.apache.org/jira/browse/SPARK-17307
             Project: Spark
          Issue Type: Documentation
            Reporter: Aseem Bansal


I faced this lack of documentation when I was trying to save a model to S3. 
Initially I thought it should be only write. Then I found it also needs delete 
to delete temporary files. Now I requested access for delete and tried again 
and I am get the error

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: 
org.jets3t.service.S3ServiceException: S3 PUT failed for '/dev-qa_%24folder%24' 
XML Error Message

To reproduce this error the below can be used

{code}
SparkSession sparkSession = SparkSession
                .builder()
                .appName("my app")
                .master("local") 
                .getOrCreate();

        JavaSparkContext jsc = new 
JavaSparkContext(sparkSession.sparkContext());

jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", <ACCESS_KEY>);
        jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", <SECRET 
ACCESS KEY>);

//Create a Pipelinemode

        
pipelineModel.write().overwrite().save("s3n://<BUCKET>/dev-qa/modelTest");
{code}

This back and forth could be avoided if it was clearly mentioned what all 
access spark needs to write to S3. Also would be great if why all of the access 
is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to