[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work

2016-10-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591078#comment-15591078
 ] 

Sean Owen commented on SPARK-18017:
---

I don't believe you're intended to be able to modify the config parameters like 
this. You typically set all parameters in Hadoop config files, or with --conf 
on the command line.

> Changing Hadoop parameter through 
> sparkSession.sparkContext.hadoopConfiguration doesn't work
> 
>
> Key: SPARK-18017
> URL: https://issues.apache.org/jira/browse/SPARK-18017
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Scala version 2.11.8; Java 1.8.0_91; 
> com.databricks:spark-csv_2.11:1.2.0
>Reporter: Yuehua Zhang
>
> My Spark job tries to read csv files on S3. I need to control the number of 
> partitions created so I set Hadoop parameter fs.s3n.block.size. However, it 
> stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is 
> related to https://issues.apache.org/jira/browse/SPARK-15991. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work

2016-10-22 Thread Yuehua Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15598339#comment-15598339
 ] 

Yuehua Zhang commented on SPARK-18017:
--

Thanks for your input! I only want to change the parameter for one job, so I 
can't edit Hadoop config file. For the other option, if i add it through 
spark-submit command i will get "Warning: Ignoring non-spark config property: 
fs.s3n.block.size=524288000". 
The reason i am thinking this is related to spark upgrade is because this 
setting worked well on Spark 1.6 but stopped working after we upgraded to Spark 
2.0.

> Changing Hadoop parameter through 
> sparkSession.sparkContext.hadoopConfiguration doesn't work
> 
>
> Key: SPARK-18017
> URL: https://issues.apache.org/jira/browse/SPARK-18017
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Scala version 2.11.8; Java 1.8.0_91; 
> com.databricks:spark-csv_2.11:1.2.0
>Reporter: Yuehua Zhang
>
> My Spark job tries to read csv files on S3. I need to control the number of 
> partitions created so I set Hadoop parameter fs.s3n.block.size. However, it 
> stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is 
> related to https://issues.apache.org/jira/browse/SPARK-15991. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work

2016-10-24 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601473#comment-15601473
 ] 

Sean Owen commented on SPARK-18017:
---

Ah, try spark.hadoop.fs.s3n.block.size=...

> Changing Hadoop parameter through 
> sparkSession.sparkContext.hadoopConfiguration doesn't work
> 
>
> Key: SPARK-18017
> URL: https://issues.apache.org/jira/browse/SPARK-18017
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Scala version 2.11.8; Java 1.8.0_91; 
> com.databricks:spark-csv_2.11:1.2.0
>Reporter: Yuehua Zhang
>
> My Spark job tries to read csv files on S3. I need to control the number of 
> partitions created so I set Hadoop parameter fs.s3n.block.size. However, it 
> stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is 
> related to https://issues.apache.org/jira/browse/SPARK-15991. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work

2016-10-24 Thread Yuehua Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602708#comment-15602708
 ] 

Yuehua Zhang commented on SPARK-18017:
--

Yeah, I tried that also. Not working either...

> Changing Hadoop parameter through 
> sparkSession.sparkContext.hadoopConfiguration doesn't work
> 
>
> Key: SPARK-18017
> URL: https://issues.apache.org/jira/browse/SPARK-18017
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Scala version 2.11.8; Java 1.8.0_91; 
> com.databricks:spark-csv_2.11:1.2.0
>Reporter: Yuehua Zhang
>
> My Spark job tries to read csv files on S3. I need to control the number of 
> partitions created so I set Hadoop parameter fs.s3n.block.size. However, it 
> stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is 
> related to https://issues.apache.org/jira/browse/SPARK-15991. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work

2016-10-24 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602917#comment-15602917
 ] 

Sean Owen commented on SPARK-18017:
---

You need to set it with --conf, not programmatically, I'd imagine.

> Changing Hadoop parameter through 
> sparkSession.sparkContext.hadoopConfiguration doesn't work
> 
>
> Key: SPARK-18017
> URL: https://issues.apache.org/jira/browse/SPARK-18017
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Scala version 2.11.8; Java 1.8.0_91; 
> com.databricks:spark-csv_2.11:1.2.0
>Reporter: Yuehua Zhang
>
> My Spark job tries to read csv files on S3. I need to control the number of 
> partitions created so I set Hadoop parameter fs.s3n.block.size. However, it 
> stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is 
> related to https://issues.apache.org/jira/browse/SPARK-15991. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work

2016-10-24 Thread Yuehua Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603181#comment-15603181
 ] 

Yuehua Zhang commented on SPARK-18017:
--

Yeah, that is what i did: "spark-submit --conf 
spark.hadoop.fs.s3n.block.size=524288000 ...". It did get rid of the non-spark 
config warning though. 

> Changing Hadoop parameter through 
> sparkSession.sparkContext.hadoopConfiguration doesn't work
> 
>
> Key: SPARK-18017
> URL: https://issues.apache.org/jira/browse/SPARK-18017
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Scala version 2.11.8; Java 1.8.0_91; 
> com.databricks:spark-csv_2.11:1.2.0
>Reporter: Yuehua Zhang
>
> My Spark job tries to read csv files on S3. I need to control the number of 
> partitions created so I set Hadoop parameter fs.s3n.block.size. However, it 
> stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is 
> related to https://issues.apache.org/jira/browse/SPARK-15991. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work

2016-11-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643864#comment-15643864
 ] 

Steve Loughran commented on SPARK-18017:


you can check what's been picked up by grabbing a copy of the filesystem 
instance and then logging the value returned in {{getDefaultBlockSize()}}.

If you switch to S3a, which you should be, calling toString() on the FS 
instance is generally sufficient to dump the block size and lots of other 
useful bits of information. It's relevant property is {{fs.s3a.block.size}}.



> Changing Hadoop parameter through 
> sparkSession.sparkContext.hadoopConfiguration doesn't work
> 
>
> Key: SPARK-18017
> URL: https://issues.apache.org/jira/browse/SPARK-18017
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Scala version 2.11.8; Java 1.8.0_91; 
> com.databricks:spark-csv_2.11:1.2.0
>Reporter: Yuehua Zhang
>
> My Spark job tries to read csv files on S3. I need to control the number of 
> partitions created so I set Hadoop parameter fs.s3n.block.size. However, it 
> stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is 
> related to https://issues.apache.org/jira/browse/SPARK-15991. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org