[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work
[ https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591078#comment-15591078 ] Sean Owen commented on SPARK-18017: --- I don't believe you're intended to be able to modify the config parameters like this. You typically set all parameters in Hadoop config files, or with --conf on the command line. > Changing Hadoop parameter through > sparkSession.sparkContext.hadoopConfiguration doesn't work > > > Key: SPARK-18017 > URL: https://issues.apache.org/jira/browse/SPARK-18017 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Scala version 2.11.8; Java 1.8.0_91; > com.databricks:spark-csv_2.11:1.2.0 >Reporter: Yuehua Zhang > > My Spark job tries to read csv files on S3. I need to control the number of > partitions created so I set Hadoop parameter fs.s3n.block.size. However, it > stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is > related to https://issues.apache.org/jira/browse/SPARK-15991. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work
[ https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15598339#comment-15598339 ] Yuehua Zhang commented on SPARK-18017: -- Thanks for your input! I only want to change the parameter for one job, so I can't edit Hadoop config file. For the other option, if i add it through spark-submit command i will get "Warning: Ignoring non-spark config property: fs.s3n.block.size=524288000". The reason i am thinking this is related to spark upgrade is because this setting worked well on Spark 1.6 but stopped working after we upgraded to Spark 2.0. > Changing Hadoop parameter through > sparkSession.sparkContext.hadoopConfiguration doesn't work > > > Key: SPARK-18017 > URL: https://issues.apache.org/jira/browse/SPARK-18017 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Scala version 2.11.8; Java 1.8.0_91; > com.databricks:spark-csv_2.11:1.2.0 >Reporter: Yuehua Zhang > > My Spark job tries to read csv files on S3. I need to control the number of > partitions created so I set Hadoop parameter fs.s3n.block.size. However, it > stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is > related to https://issues.apache.org/jira/browse/SPARK-15991. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work
[ https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601473#comment-15601473 ] Sean Owen commented on SPARK-18017: --- Ah, try spark.hadoop.fs.s3n.block.size=... > Changing Hadoop parameter through > sparkSession.sparkContext.hadoopConfiguration doesn't work > > > Key: SPARK-18017 > URL: https://issues.apache.org/jira/browse/SPARK-18017 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Scala version 2.11.8; Java 1.8.0_91; > com.databricks:spark-csv_2.11:1.2.0 >Reporter: Yuehua Zhang > > My Spark job tries to read csv files on S3. I need to control the number of > partitions created so I set Hadoop parameter fs.s3n.block.size. However, it > stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is > related to https://issues.apache.org/jira/browse/SPARK-15991. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work
[ https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602708#comment-15602708 ] Yuehua Zhang commented on SPARK-18017: -- Yeah, I tried that also. Not working either... > Changing Hadoop parameter through > sparkSession.sparkContext.hadoopConfiguration doesn't work > > > Key: SPARK-18017 > URL: https://issues.apache.org/jira/browse/SPARK-18017 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Scala version 2.11.8; Java 1.8.0_91; > com.databricks:spark-csv_2.11:1.2.0 >Reporter: Yuehua Zhang > > My Spark job tries to read csv files on S3. I need to control the number of > partitions created so I set Hadoop parameter fs.s3n.block.size. However, it > stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is > related to https://issues.apache.org/jira/browse/SPARK-15991. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work
[ https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602917#comment-15602917 ] Sean Owen commented on SPARK-18017: --- You need to set it with --conf, not programmatically, I'd imagine. > Changing Hadoop parameter through > sparkSession.sparkContext.hadoopConfiguration doesn't work > > > Key: SPARK-18017 > URL: https://issues.apache.org/jira/browse/SPARK-18017 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Scala version 2.11.8; Java 1.8.0_91; > com.databricks:spark-csv_2.11:1.2.0 >Reporter: Yuehua Zhang > > My Spark job tries to read csv files on S3. I need to control the number of > partitions created so I set Hadoop parameter fs.s3n.block.size. However, it > stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is > related to https://issues.apache.org/jira/browse/SPARK-15991. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work
[ https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603181#comment-15603181 ] Yuehua Zhang commented on SPARK-18017: -- Yeah, that is what i did: "spark-submit --conf spark.hadoop.fs.s3n.block.size=524288000 ...". It did get rid of the non-spark config warning though. > Changing Hadoop parameter through > sparkSession.sparkContext.hadoopConfiguration doesn't work > > > Key: SPARK-18017 > URL: https://issues.apache.org/jira/browse/SPARK-18017 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Scala version 2.11.8; Java 1.8.0_91; > com.databricks:spark-csv_2.11:1.2.0 >Reporter: Yuehua Zhang > > My Spark job tries to read csv files on S3. I need to control the number of > partitions created so I set Hadoop parameter fs.s3n.block.size. However, it > stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is > related to https://issues.apache.org/jira/browse/SPARK-15991. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18017) Changing Hadoop parameter through sparkSession.sparkContext.hadoopConfiguration doesn't work
[ https://issues.apache.org/jira/browse/SPARK-18017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643864#comment-15643864 ] Steve Loughran commented on SPARK-18017: you can check what's been picked up by grabbing a copy of the filesystem instance and then logging the value returned in {{getDefaultBlockSize()}}. If you switch to S3a, which you should be, calling toString() on the FS instance is generally sufficient to dump the block size and lots of other useful bits of information. It's relevant property is {{fs.s3a.block.size}}. > Changing Hadoop parameter through > sparkSession.sparkContext.hadoopConfiguration doesn't work > > > Key: SPARK-18017 > URL: https://issues.apache.org/jira/browse/SPARK-18017 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Scala version 2.11.8; Java 1.8.0_91; > com.databricks:spark-csv_2.11:1.2.0 >Reporter: Yuehua Zhang > > My Spark job tries to read csv files on S3. I need to control the number of > partitions created so I set Hadoop parameter fs.s3n.block.size. However, it > stopped working after we upgrade Spark from 1.6.1 to 2.0.0. Not sure if it is > related to https://issues.apache.org/jira/browse/SPARK-15991. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org