[jira] [Created] (SPARK-43068) Create an allGather function with a byte array
Derek M Miller created SPARK-43068: -- Summary: Create an allGather function with a byte array Key: SPARK-43068 URL: https://issues.apache.org/jira/browse/SPARK-43068 Project: Spark Issue Type: Improvement Components: PySpark, Spark Core Affects Versions: 3.3.2 Reporter: Derek M Miller The allGather function is very slow at the moment. I noticed a comment that stated a string is used for convenience at the expense of performance. It would be a decent improvement in making it a byte array (or even a separate function). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues
[ https://issues.apache.org/jira/browse/SPARK-22584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264475#comment-16264475 ] Derek M Miller commented on SPARK-22584: So, in this case, the executor and driver were both given 16g of memory (i.e --driver-memory 16g --executor-memory 16g). The dataframe was loaded from parquet. If I save the dataframe as is with no partitions, I don't have any issues. If I save it with one partition, same thing. However, adding the second partition causes the job to need to write to disk. The error every once in awhile is in the driver. However, I mostly see it in an executor (it isn't consistent). It ran out of memory in the middle of the partitionBy. It seemed to write a couple of partitions, then fail in the middle of the action. > dataframe write partitionBy out of disk/java heap issues > > > Key: SPARK-22584 > URL: https://issues.apache.org/jira/browse/SPARK-22584 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Derek M Miller > > I have been seeing some issues with partitionBy for the dataframe writer. I > currently have a file that is 6mb, just for testing, and it has around 1487 > rows and 21 columns. There is nothing out of the ordinary with the columns, > having either a DoubleType or StringType. The partitionBy calls two different > partitions with verified low cardinality. One partition has 30 unique values > and the other one has 2 unique values. > ```scala > df > .write.partitionBy("first", "second") > .mode(SaveMode.Overwrite) > .parquet(s"$location$example/$corrId/") > ``` > When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory > each), I am getting a java heap out of memory error. I have > maximizeResourceAllocation set, and verified on the instances. I have even > set it to false, explicitly set the driver and executor memory to 16g, but > still had the same issue. Occasionally I get an error about disk space, and > the job seems to work if I use an r3.xlarge (that has the ssd). But that > seems weird that 6mb of data needs to spill to disk. > The problem mainly seems to be centered around two + partitions vs 1. If I > just use either of the partitions only, I have no problems. It's also worth > noting that each of the partitions are evenly distributed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues
[ https://issues.apache.org/jira/browse/SPARK-22584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264447#comment-16264447 ] Derek M Miller commented on SPARK-22584: I disagree. I should not be running out of memory for a file that only has 6mb with 5 instances that have 16gb of memory. Even when the data is evenly distributed across partitions, I am still seeing this issue. I posted this on stackoverflow, and it seems like others are experiencing this issue as well https://stackoverflow.com/questions/47382977/spark-2-2-write-partitionby-out-of-memory-exception. > dataframe write partitionBy out of disk/java heap issues > > > Key: SPARK-22584 > URL: https://issues.apache.org/jira/browse/SPARK-22584 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Derek M Miller > > I have been seeing some issues with partitionBy for the dataframe writer. I > currently have a file that is 6mb, just for testing, and it has around 1487 > rows and 21 columns. There is nothing out of the ordinary with the columns, > having either a DoubleType or StringType. The partitionBy calls two different > partitions with verified low cardinality. One partition has 30 unique values > and the other one has 2 unique values. > ```scala > df > .write.partitionBy("first", "second") > .mode(SaveMode.Overwrite) > .parquet(s"$location$example/$corrId/") > ``` > When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory > each), I am getting a java heap out of memory error. I have > maximizeResourceAllocation set, and verified on the instances. I have even > set it to false, explicitly set the driver and executor memory to 16g, but > still had the same issue. Occasionally I get an error about disk space, and > the job seems to work if I use an r3.xlarge (that has the ssd). But that > seems weird that 6mb of data needs to spill to disk. > The problem mainly seems to be centered around two + partitions vs 1. If I > just use either of the partitions only, I have no problems. It's also worth > noting that each of the partitions are evenly distributed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues
[ https://issues.apache.org/jira/browse/SPARK-22584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek M Miller updated SPARK-22584: --- Description: I have been seeing some issues with partitionBy for the dataframe writer. I currently have a file that is 6mb, just for testing, and it has around 1487 rows and 21 columns. There is nothing out of the ordinary with the columns, having either a DoubleType or StringType. The partitionBy calls two different partitions with verified low cardinality. One partition has 30 unique values and the other one has 2 unique values. ```scala df .write.partitionBy("first", "second") .mode(SaveMode.Overwrite) .parquet(s"$location$example/$corrId/") ``` When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory each), I am getting a java heap out of memory error. I have maximizeResourceAllocation set, and verified on the instances. I have even set it to false, explicitly set the driver and executor memory to 16g, but still had the same issue. Occasionally I get an error about disk space, and the job seems to work if I use an r3.xlarge (that has the ssd). But that seems weird that 6mb of data needs to spill to disk. The problem mainly seems to be centered around two + partitions vs 1. If I just use either of the partitions only, I have no problems. It's also worth noting that each of the partitions are evenly distributed. was: I have been seeing some issues with partitionBy for the dataframe writer. I currently have a file that is 6mb, just for testing, and it has around 1487 rows and 21 columns. There is nothing out of the ordinary with the columns, having either a DoubleType or String The partitionBy calls two different partitions with verified low cardinality. One partition has 30 unique values and the other one has 2 unique values. ```scala df .write.partitionBy("first", "second") .mode(SaveMode.Overwrite) .parquet(s"$location$example/$corrId/") ``` When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory), I am getting a java heap out of memory error. I have maximizeResourceAllocation set, and verified on the instances. I have even set it to false, explicitly set the driver and executor memory to 16g, but still had the same issue. Occasionally I get an error about disk space, and the job seems to work if I use an r3.xlarge (that has the ssd). But that seems weird that 6mb of data needs to spill to disk. The problem mainly seems to be centered around two + partitions vs 1. If I just use either of the partitions only, I have no problems. It's also worth noting that each of the partitions are evenly distributed. > dataframe write partitionBy out of disk/java heap issues > > > Key: SPARK-22584 > URL: https://issues.apache.org/jira/browse/SPARK-22584 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Derek M Miller > > I have been seeing some issues with partitionBy for the dataframe writer. I > currently have a file that is 6mb, just for testing, and it has around 1487 > rows and 21 columns. There is nothing out of the ordinary with the columns, > having either a DoubleType or StringType. The partitionBy calls two different > partitions with verified low cardinality. One partition has 30 unique values > and the other one has 2 unique values. > ```scala > df > .write.partitionBy("first", "second") > .mode(SaveMode.Overwrite) > .parquet(s"$location$example/$corrId/") > ``` > When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory > each), I am getting a java heap out of memory error. I have > maximizeResourceAllocation set, and verified on the instances. I have even > set it to false, explicitly set the driver and executor memory to 16g, but > still had the same issue. Occasionally I get an error about disk space, and > the job seems to work if I use an r3.xlarge (that has the ssd). But that > seems weird that 6mb of data needs to spill to disk. > The problem mainly seems to be centered around two + partitions vs 1. If I > just use either of the partitions only, I have no problems. It's also worth > noting that each of the partitions are evenly distributed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues
Derek M Miller created SPARK-22584: -- Summary: dataframe write partitionBy out of disk/java heap issues Key: SPARK-22584 URL: https://issues.apache.org/jira/browse/SPARK-22584 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Derek M Miller I have been seeing some issues with partitionBy for the dataframe writer. I currently have a file that is 6mb, just for testing, and it has around 1487 rows and 21 columns. There is nothing out of the ordinary with the columns, having either a DoubleType or String The partitionBy calls two different partitions with verified low cardinality. One partition has 30 unique values and the other one has 2 unique values. ```scala df .write.partitionBy("first", "second") .mode(SaveMode.Overwrite) .parquet(s"$location$example/$corrId/") ``` When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory), I am getting a java heap out of memory error. I have maximizeResourceAllocation set, and verified on the instances. I have even set it to false, explicitly set the driver and executor memory to 16g, but still had the same issue. Occasionally I get an error about disk space, and the job seems to work if I use an r3.xlarge (that has the ssd). But that seems weird that 6mb of data needs to spill to disk. The problem mainly seems to be centered around two + partitions vs 1. If I just use either of the partitions only, I have no problems. It's also worth noting that each of the partitions are evenly distributed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically
[ https://issues.apache.org/jira/browse/SPARK-19374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek M Miller updated SPARK-19374: --- Description: I am currently getting an SSL error when turning on ssl. I have confirmed nothing is wrong the certificates as well. This is running on the emr and has the following configuration for spark-defaults (obviously changed the passwords and paths): {code} { "Classification": "spark-defaults", "Properties": { "spark.yarn.maxAppAttempts": "1", "spark.yarn.executor.memoryOverhead": "2048", "spark.ssl.enabled": "true", "spark.ssl.keyStore": "not_real_path/keystore.jks", "spark.ssl.keyStorePassword": "not_a_real_password", "spark.ssl.trustStore": "not_real_path/truststore.jks", "spark.ssl.trustStorePassword": "not_a_real_password" } } {code} and I am getting the following exception: {code} 17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1485460802835_0001 Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called {code} was: I am currently getting an SSL error when turning on ssl. I have confirmed nothing is wrong the certificates as well. This is running on the emr and has the following configuration for spark-defaults: {code} { "Classification": "spark-defaults", "Properties": { "spark.yarn.maxAppAttempts": "1", "spark.yarn.executor.memoryOverhead": "2048", "spark.ssl.enabled": "true", "spark.ssl.keyStore": "not_real_path/keystore.jks", "spark.ssl.keyStorePassword": "not_a_real_password", "spark.ssl.trustStore": "not_real_path/truststore.jks", "spark.ssl.trustStorePassword": "not_a_real_password" } } {code} and I am getting the following exception: {code} 17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1485460802835_0001 Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Updated] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically
[ https://issues.apache.org/jira/browse/SPARK-19374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek M Miller updated SPARK-19374: --- Description: I am currently getting an SSL error when turning on ssl. I have confirmed nothing is wrong the certificates as well. This is running on the emr and has the following configuration for spark-defaults: {code} { "Classification": "spark-defaults", "Properties": { "spark.yarn.maxAppAttempts": "1", "spark.yarn.executor.memoryOverhead": "2048", "spark.ssl.enabled": "true", "spark.ssl.keyStore": "not_real_path/keystore.jks", "spark.ssl.keyStorePassword": "not_a_real_password", "spark.ssl.trustStore": "not_real_path/truststore.jks", "spark.ssl.trustStorePassword": "not_a_real_password" } } {code} and I am getting the following exception: {code} 17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1485460802835_0001 Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called {code} was: I am currently getting an SSL error when turning on ssl. I have confirmed nothing is wrong the certificates as well. This is running on the emr and has the following configuration for spark-defaults: {code} { "Classification": "spark-defaults", "Properties": { "spark.yarn.maxAppAttempts": "1", "spark.yarn.executor.memoryOverhead": "2048", "spark.ssl.enabled": "true", "spark.ssl.keyStore": "not_real_path/keystore.jks", "spark.ssl.keyStorePassword": "not_a_real_password", "spark.ssl.trustStore": "not_real_path/truststore.jks", "spark.ssl.trustStorePassword": "not_a_real_password" } } {code} and I am getting the following exception: ``` 17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1485460802835_0001 Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Updated] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically
[ https://issues.apache.org/jira/browse/SPARK-19374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek M Miller updated SPARK-19374: --- Description: I am currently getting an SSL error when turning on ssl. I have confirmed nothing is wrong the certificates as well. This is running on the emr and has the following configuration for spark-defaults: {code} { "Classification": "spark-defaults", "Properties": { "spark.yarn.maxAppAttempts": "1", "spark.yarn.executor.memoryOverhead": "2048", "spark.ssl.enabled": "true", "spark.ssl.keyStore": "not_real_path/keystore.jks", "spark.ssl.keyStorePassword": "not_a_real_password", "spark.ssl.trustStore": "not_real_path/truststore.jks", "spark.ssl.trustStorePassword": "not_a_real_password" } } {code} and I am getting the following exception: ``` 17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1485460802835_0001 Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called ``` was: I am currently getting an SSL error when turning on ssl. I have confirmed nothing is wrong the certificates as well. This is running on the emr and has the following configuration for spark-defaults: ``` { "Classification": "spark-defaults", "Properties": { "spark.yarn.maxAppAttempts": "1", "spark.yarn.executor.memoryOverhead": "2048", "spark.ssl.enabled": "true", "spark.ssl.keyStore": "not_real_path/keystore.jks", "spark.ssl.keyStorePassword": "not_a_real_password", "spark.ssl.trustStore": "not_real_path/truststore.jks", "spark.ssl.trustStorePassword": "not_a_real_password" } } ``` and I am getting the following exception: ``` 17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1485460802835_0001 Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[jira] [Created] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically
Derek M Miller created SPARK-19374: -- Summary: java.security.KeyManagementException: Default SSLContext is initialized automatically Key: SPARK-19374 URL: https://issues.apache.org/jira/browse/SPARK-19374 Project: Spark Issue Type: Bug Reporter: Derek M Miller I am currently getting an SSL error when turning on ssl. I have confirmed nothing is wrong the certificates as well. This is running on the emr and has the following configuration for spark-defaults: ``` { "Classification": "spark-defaults", "Properties": { "spark.yarn.maxAppAttempts": "1", "spark.yarn.executor.memoryOverhead": "2048", "spark.ssl.enabled": "true", "spark.ssl.keyStore": "not_real_path/keystore.jks", "spark.ssl.keyStorePassword": "not_a_real_password", "spark.ssl.trustStore": "not_real_path/truststore.jks", "spark.ssl.trustStorePassword": "not_a_real_password" } } ``` and I am getting the following exception: ``` 17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop 17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1485460802835_0001 Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19079) Spark 1.6.1 SASL Error with Yarn
[ https://issues.apache.org/jira/browse/SPARK-19079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek M Miller updated SPARK-19079: --- Description: Currently, there seems to be an issue when using SASL in Spark with yarn with at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my configuration here: http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn . In short, I have added the spark.authenticate parameter to the hadoop and spark configuration. I also have these parameters as well. ``` "spark.authenticate.enableSaslEncryption": "true", "spark.network.sasl.serverAlwaysEncrypt": "true" ``` However, I am consistently getting this error message: ``` java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message type: -22 ``` Further debugging has not been helpful. I think it is worth noting that this is all on Amazon's emr as well. As a side note, even if this is not a bug, I think it would at the very least be worth updating the docs. The docs make it seem like you only need to add 'spark.authenticate' to the spark config, where it sounds like you actually need it for the hadoop configuration as well. was: Currently, there seems to be an issue when using SASL in Spark with yarn with at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my configuration here: http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn . In short, I have added the spark.authenticate parameter to the hadoop and spark configuration. I also have these parameters as well. ``` "spark.authenticate.enableSaslEncryption": "true", "spark.network.sasl.serverAlwaysEncrypt": "true" ``` However, I am consistently getting this error message: ``` java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message type: -22 ``` Further debugging has not been helpful. I think it is worth noting that this is all on Amazon's emr as well. > Spark 1.6.1 SASL Error with Yarn > > > Key: SPARK-19079 > URL: https://issues.apache.org/jira/browse/SPARK-19079 > Project: Spark > Issue Type: Bug >Reporter: Derek M Miller > > Currently, there seems to be an issue when using SASL in Spark with yarn with > at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my > configuration here: > http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn > . > In short, I have added the spark.authenticate parameter to the hadoop and > spark configuration. I also have these parameters as well. > ``` > "spark.authenticate.enableSaslEncryption": "true", > "spark.network.sasl.serverAlwaysEncrypt": "true" > ``` > However, I am consistently getting this error message: > ``` > java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown > message type: -22 > ``` > Further debugging has not been helpful. I think it is worth noting that this > is all on Amazon's emr as well. As a side note, even if this is not a bug, I > think it would at the very least be worth updating the docs. The docs make it > seem like you only need to add 'spark.authenticate' to the spark config, > where it sounds like you actually need it for the hadoop configuration as > well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19079) Spark 1.6.1 SASL Error with Yarn
Derek M Miller created SPARK-19079: -- Summary: Spark 1.6.1 SASL Error with Yarn Key: SPARK-19079 URL: https://issues.apache.org/jira/browse/SPARK-19079 Project: Spark Issue Type: Bug Reporter: Derek M Miller Currently, there seems to be an issue when using SASL in Spark with yarn with at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my configuration here: http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn . In short, I have added the spark.authenticate parameter to the hadoop and spark configuration. I also have these parameters as well. ``` "spark.authenticate.enableSaslEncryption": "true", "spark.network.sasl.serverAlwaysEncrypt": "true" ``` However, I am consistently getting this error message: ``` java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message type: -22 ``` Further debugging has not been helpful. I think it is worth noting that this is all on Amazon's emr as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org