[jira] [Created] (SPARK-43068) Create an allGather function with a byte array

2023-04-07 Thread Derek M Miller (Jira)
Derek M Miller created SPARK-43068:
--

 Summary: Create an allGather function with a byte array
 Key: SPARK-43068
 URL: https://issues.apache.org/jira/browse/SPARK-43068
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Spark Core
Affects Versions: 3.3.2
Reporter: Derek M Miller


The allGather function is very slow at the moment. I noticed a comment that 
stated a string is used for convenience at the expense of performance. It would 
be a decent improvement in making it a byte array (or even a separate function).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues

2017-11-23 Thread Derek M Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264475#comment-16264475
 ] 

Derek M Miller commented on SPARK-22584:


So, in this case, the executor and driver were both given 16g of memory (i.e 
--driver-memory 16g --executor-memory 16g). The dataframe was loaded from 
parquet. If I save the dataframe as is with no partitions, I don't have any 
issues. If I save it with one partition, same thing. However, adding the second 
partition causes the job to need to write to disk. The error every once in 
awhile is in the driver. However, I mostly see it in an executor (it isn't 
consistent). It ran out of memory in the middle of the partitionBy. It seemed 
to write a couple of partitions, then fail in the middle of the action.

> dataframe write partitionBy out of disk/java heap issues
> 
>
> Key: SPARK-22584
> URL: https://issues.apache.org/jira/browse/SPARK-22584
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Derek M Miller
>
> I have been seeing some issues with partitionBy for the dataframe writer. I 
> currently have a file that is 6mb, just for testing, and it has around 1487 
> rows and 21 columns. There is nothing out of the ordinary with the columns, 
> having either a DoubleType or StringType. The partitionBy calls two different 
> partitions with verified low cardinality. One partition has 30 unique values 
> and the other one has 2 unique values.
> ```scala
> df
> .write.partitionBy("first", "second")
> .mode(SaveMode.Overwrite)
> .parquet(s"$location$example/$corrId/")
> ```
> When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory 
> each), I am getting a java heap out of memory error. I have 
> maximizeResourceAllocation set, and verified on the instances. I have even 
> set it to false, explicitly set the driver and executor memory to 16g, but 
> still had the same issue. Occasionally I get an error about disk space, and 
> the job seems to work if I use an r3.xlarge (that has the ssd). But that 
> seems weird that 6mb of data needs to spill to disk.
> The problem mainly seems to be centered around two + partitions vs 1. If I 
> just use either of the partitions only, I have no problems. It's also worth 
> noting that each of the partitions are evenly distributed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues

2017-11-23 Thread Derek M Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264447#comment-16264447
 ] 

Derek M Miller commented on SPARK-22584:


I disagree. I should not be running out of memory for a file that only has 6mb 
with 5 instances that have 16gb of memory. Even when the data is evenly 
distributed across partitions, I am still seeing this issue. I posted this on 
stackoverflow, and it seems like others are experiencing this issue as well 
https://stackoverflow.com/questions/47382977/spark-2-2-write-partitionby-out-of-memory-exception.

> dataframe write partitionBy out of disk/java heap issues
> 
>
> Key: SPARK-22584
> URL: https://issues.apache.org/jira/browse/SPARK-22584
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Derek M Miller
>
> I have been seeing some issues with partitionBy for the dataframe writer. I 
> currently have a file that is 6mb, just for testing, and it has around 1487 
> rows and 21 columns. There is nothing out of the ordinary with the columns, 
> having either a DoubleType or StringType. The partitionBy calls two different 
> partitions with verified low cardinality. One partition has 30 unique values 
> and the other one has 2 unique values.
> ```scala
> df
> .write.partitionBy("first", "second")
> .mode(SaveMode.Overwrite)
> .parquet(s"$location$example/$corrId/")
> ```
> When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory 
> each), I am getting a java heap out of memory error. I have 
> maximizeResourceAllocation set, and verified on the instances. I have even 
> set it to false, explicitly set the driver and executor memory to 16g, but 
> still had the same issue. Occasionally I get an error about disk space, and 
> the job seems to work if I use an r3.xlarge (that has the ssd). But that 
> seems weird that 6mb of data needs to spill to disk.
> The problem mainly seems to be centered around two + partitions vs 1. If I 
> just use either of the partitions only, I have no problems. It's also worth 
> noting that each of the partitions are evenly distributed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues

2017-11-22 Thread Derek M Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek M Miller updated SPARK-22584:
---
Description: 
I have been seeing some issues with partitionBy for the dataframe writer. I 
currently have a file that is 6mb, just for testing, and it has around 1487 
rows and 21 columns. There is nothing out of the ordinary with the columns, 
having either a DoubleType or StringType. The partitionBy calls two different 
partitions with verified low cardinality. One partition has 30 unique values 
and the other one has 2 unique values.

```scala
df
.write.partitionBy("first", "second")
.mode(SaveMode.Overwrite)
.parquet(s"$location$example/$corrId/")
```

When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory 
each), I am getting a java heap out of memory error. I have 
maximizeResourceAllocation set, and verified on the instances. I have even set 
it to false, explicitly set the driver and executor memory to 16g, but still 
had the same issue. Occasionally I get an error about disk space, and the job 
seems to work if I use an r3.xlarge (that has the ssd). But that seems weird 
that 6mb of data needs to spill to disk.

The problem mainly seems to be centered around two + partitions vs 1. If I just 
use either of the partitions only, I have no problems. It's also worth noting 
that each of the partitions are evenly distributed.

  was:
I have been seeing some issues with partitionBy for the dataframe writer. I 
currently have a file that is 6mb, just for testing, and it has around 1487 
rows and 21 columns. There is nothing out of the ordinary with the columns, 
having either a DoubleType or String The partitionBy calls two different 
partitions with verified low cardinality. One partition has 30 unique values 
and the other one has 2 unique values.

```scala
df
.write.partitionBy("first", "second")
.mode(SaveMode.Overwrite)
.parquet(s"$location$example/$corrId/")
```

When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory), 
I am getting a java heap out of memory error. I have maximizeResourceAllocation 
set, and verified on the instances. I have even set it to false, explicitly set 
the driver and executor memory to 16g, but still had the same issue. 
Occasionally I get an error about disk space, and the job seems to work if I 
use an r3.xlarge (that has the ssd). But that seems weird that 6mb of data 
needs to spill to disk.

The problem mainly seems to be centered around two + partitions vs 1. If I just 
use either of the partitions only, I have no problems. It's also worth noting 
that each of the partitions are evenly distributed.


> dataframe write partitionBy out of disk/java heap issues
> 
>
> Key: SPARK-22584
> URL: https://issues.apache.org/jira/browse/SPARK-22584
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Derek M Miller
>
> I have been seeing some issues with partitionBy for the dataframe writer. I 
> currently have a file that is 6mb, just for testing, and it has around 1487 
> rows and 21 columns. There is nothing out of the ordinary with the columns, 
> having either a DoubleType or StringType. The partitionBy calls two different 
> partitions with verified low cardinality. One partition has 30 unique values 
> and the other one has 2 unique values.
> ```scala
> df
> .write.partitionBy("first", "second")
> .mode(SaveMode.Overwrite)
> .parquet(s"$location$example/$corrId/")
> ```
> When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory 
> each), I am getting a java heap out of memory error. I have 
> maximizeResourceAllocation set, and verified on the instances. I have even 
> set it to false, explicitly set the driver and executor memory to 16g, but 
> still had the same issue. Occasionally I get an error about disk space, and 
> the job seems to work if I use an r3.xlarge (that has the ssd). But that 
> seems weird that 6mb of data needs to spill to disk.
> The problem mainly seems to be centered around two + partitions vs 1. If I 
> just use either of the partitions only, I have no problems. It's also worth 
> noting that each of the partitions are evenly distributed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22584) dataframe write partitionBy out of disk/java heap issues

2017-11-22 Thread Derek M Miller (JIRA)
Derek M Miller created SPARK-22584:
--

 Summary: dataframe write partitionBy out of disk/java heap issues
 Key: SPARK-22584
 URL: https://issues.apache.org/jira/browse/SPARK-22584
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Derek M Miller


I have been seeing some issues with partitionBy for the dataframe writer. I 
currently have a file that is 6mb, just for testing, and it has around 1487 
rows and 21 columns. There is nothing out of the ordinary with the columns, 
having either a DoubleType or String The partitionBy calls two different 
partitions with verified low cardinality. One partition has 30 unique values 
and the other one has 2 unique values.

```scala
df
.write.partitionBy("first", "second")
.mode(SaveMode.Overwrite)
.parquet(s"$location$example/$corrId/")
```

When running this example on Amazon's EMR with 5 r4.xlarges (30 gb of memory), 
I am getting a java heap out of memory error. I have maximizeResourceAllocation 
set, and verified on the instances. I have even set it to false, explicitly set 
the driver and executor memory to 16g, but still had the same issue. 
Occasionally I get an error about disk space, and the job seems to work if I 
use an r3.xlarge (that has the ssd). But that seems weird that 6mb of data 
needs to spill to disk.

The problem mainly seems to be centered around two + partitions vs 1. If I just 
use either of the partitions only, I have no problems. It's also worth noting 
that each of the partitions are evenly distributed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically

2017-01-26 Thread Derek M Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek M Miller updated SPARK-19374:
---
Description: 
I am currently getting an SSL error when turning on ssl. I have confirmed 
nothing is wrong the certificates as well. This is running on the emr and has 
the following configuration for spark-defaults (obviously changed the passwords 
and paths):

{code}
  {
"Classification": "spark-defaults",
"Properties": {
  "spark.yarn.maxAppAttempts": "1",
  "spark.yarn.executor.memoryOverhead": "2048",
  "spark.ssl.enabled": "true",
  "spark.ssl.keyStore": "not_real_path/keystore.jks",
  "spark.ssl.keyStorePassword": "not_a_real_password",
  "spark.ssl.trustStore": "not_real_path/truststore.jks",
  "spark.ssl.trustStorePassword": "not_a_real_password"
}
  }
{code}

and I am getting the following exception:

{code}
17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop); users 
with modify permissions: Set(hadoop)
17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory 
.sparkStaging/application_1485460802835_0001
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called

{code}


  was:
I am currently getting an SSL error when turning on ssl. I have confirmed 
nothing is wrong the certificates as well. This is running on the emr and has 
the following configuration for spark-defaults:

{code}
  {
"Classification": "spark-defaults",
"Properties": {
  "spark.yarn.maxAppAttempts": "1",
  "spark.yarn.executor.memoryOverhead": "2048",
  "spark.ssl.enabled": "true",
  "spark.ssl.keyStore": "not_real_path/keystore.jks",
  "spark.ssl.keyStorePassword": "not_a_real_password",
  "spark.ssl.trustStore": "not_real_path/truststore.jks",
  "spark.ssl.trustStorePassword": "not_a_real_password"
}
  }
{code}

and I am getting the following exception:

{code}
17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop); users 
with modify permissions: Set(hadoop)
17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory 
.sparkStaging/application_1485460802835_0001
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

[jira] [Updated] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically

2017-01-26 Thread Derek M Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek M Miller updated SPARK-19374:
---
Description: 
I am currently getting an SSL error when turning on ssl. I have confirmed 
nothing is wrong the certificates as well. This is running on the emr and has 
the following configuration for spark-defaults:

{code}
  {
"Classification": "spark-defaults",
"Properties": {
  "spark.yarn.maxAppAttempts": "1",
  "spark.yarn.executor.memoryOverhead": "2048",
  "spark.ssl.enabled": "true",
  "spark.ssl.keyStore": "not_real_path/keystore.jks",
  "spark.ssl.keyStorePassword": "not_a_real_password",
  "spark.ssl.trustStore": "not_real_path/truststore.jks",
  "spark.ssl.trustStorePassword": "not_a_real_password"
}
  }
{code}

and I am getting the following exception:

{code}
17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop); users 
with modify permissions: Set(hadoop)
17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory 
.sparkStaging/application_1485460802835_0001
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called

{code}


  was:
I am currently getting an SSL error when turning on ssl. I have confirmed 
nothing is wrong the certificates as well. This is running on the emr and has 
the following configuration for spark-defaults:

{code}
  {
"Classification": "spark-defaults",
"Properties": {
  "spark.yarn.maxAppAttempts": "1",
  "spark.yarn.executor.memoryOverhead": "2048",
  "spark.ssl.enabled": "true",
  "spark.ssl.keyStore": "not_real_path/keystore.jks",
  "spark.ssl.keyStorePassword": "not_a_real_password",
  "spark.ssl.trustStore": "not_real_path/truststore.jks",
  "spark.ssl.trustStorePassword": "not_a_real_password"
}
  }
{code}

and I am getting the following exception:

```
17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop); users 
with modify permissions: Set(hadoop)
17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory 
.sparkStaging/application_1485460802835_0001
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

[jira] [Updated] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically

2017-01-26 Thread Derek M Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek M Miller updated SPARK-19374:
---
Description: 
I am currently getting an SSL error when turning on ssl. I have confirmed 
nothing is wrong the certificates as well. This is running on the emr and has 
the following configuration for spark-defaults:

{code}
  {
"Classification": "spark-defaults",
"Properties": {
  "spark.yarn.maxAppAttempts": "1",
  "spark.yarn.executor.memoryOverhead": "2048",
  "spark.ssl.enabled": "true",
  "spark.ssl.keyStore": "not_real_path/keystore.jks",
  "spark.ssl.keyStorePassword": "not_a_real_password",
  "spark.ssl.trustStore": "not_real_path/truststore.jks",
  "spark.ssl.trustStorePassword": "not_a_real_password"
}
  }
{code}

and I am getting the following exception:

```
17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop); users 
with modify permissions: Set(hadoop)
17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory 
.sparkStaging/application_1485460802835_0001
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called

```


  was:
I am currently getting an SSL error when turning on ssl. I have confirmed 
nothing is wrong the certificates as well. This is running on the emr and has 
the following configuration for spark-defaults:

```
  {
"Classification": "spark-defaults",
"Properties": {
  "spark.yarn.maxAppAttempts": "1",
  "spark.yarn.executor.memoryOverhead": "2048",
  "spark.ssl.enabled": "true",
  "spark.ssl.keyStore": "not_real_path/keystore.jks",
  "spark.ssl.keyStorePassword": "not_a_real_password",
  "spark.ssl.trustStore": "not_real_path/truststore.jks",
  "spark.ssl.trustStorePassword": "not_a_real_password"
}
  }
```

and I am getting the following exception:

```
17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop); users 
with modify permissions: Set(hadoop)
17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory 
.sparkStaging/application_1485460802835_0001
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

[jira] [Created] (SPARK-19374) java.security.KeyManagementException: Default SSLContext is initialized automatically

2017-01-26 Thread Derek M Miller (JIRA)
Derek M Miller created SPARK-19374:
--

 Summary: java.security.KeyManagementException: Default SSLContext 
is initialized automatically
 Key: SPARK-19374
 URL: https://issues.apache.org/jira/browse/SPARK-19374
 Project: Spark
  Issue Type: Bug
Reporter: Derek M Miller


I am currently getting an SSL error when turning on ssl. I have confirmed 
nothing is wrong the certificates as well. This is running on the emr and has 
the following configuration for spark-defaults:

```
  {
"Classification": "spark-defaults",
"Properties": {
  "spark.yarn.maxAppAttempts": "1",
  "spark.yarn.executor.memoryOverhead": "2048",
  "spark.ssl.enabled": "true",
  "spark.ssl.keyStore": "not_real_path/keystore.jks",
  "spark.ssl.keyStorePassword": "not_a_real_password",
  "spark.ssl.trustStore": "not_real_path/truststore.jks",
  "spark.ssl.trustStorePassword": "not_a_real_password"
}
  }
```

and I am getting the following exception:

```
17/01/26 20:02:31 INFO spark.SecurityManager: Changing view acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
17/01/26 20:02:31 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hadoop); users 
with modify permissions: Set(hadoop)
17/01/26 20:02:31 INFO yarn.Client: Deleting staging directory 
.sparkStaging/application_1485460802835_0001
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/01/26 20:02:31 INFO util.ShutdownHookManager: Shutdown hook called

```




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19079) Spark 1.6.1 SASL Error with Yarn

2017-01-04 Thread Derek M Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek M Miller updated SPARK-19079:
---
Description: 
Currently, there seems to be an issue when using SASL in Spark with yarn with 
at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my 
configuration here: 
http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn
 .

In short, I have added the spark.authenticate parameter to the hadoop and spark 
configuration. I also have these parameters as well.

```
  "spark.authenticate.enableSaslEncryption": "true",
  "spark.network.sasl.serverAlwaysEncrypt": "true"
```

However, I am consistently getting this error message:

```
java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message 
type: -22
```

Further debugging has not been helpful. I think it is worth noting that this is 
all on Amazon's emr as well. As a side note, even if this is not a bug, I think 
it would at the very least be worth updating the docs. The docs make it seem 
like you only need to add 'spark.authenticate' to the spark config, where it 
sounds like you actually need it for the hadoop configuration as well.


  was:
Currently, there seems to be an issue when using SASL in Spark with yarn with 
at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my 
configuration here: 
http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn
 .

In short, I have added the spark.authenticate parameter to the hadoop and spark 
configuration. I also have these parameters as well.

```
  "spark.authenticate.enableSaslEncryption": "true",
  "spark.network.sasl.serverAlwaysEncrypt": "true"
```

However, I am consistently getting this error message:

```
java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message 
type: -22
```

Further debugging has not been helpful. I think it is worth noting that this is 
all on Amazon's emr as well.



> Spark 1.6.1 SASL Error with Yarn
> 
>
> Key: SPARK-19079
> URL: https://issues.apache.org/jira/browse/SPARK-19079
> Project: Spark
>  Issue Type: Bug
>Reporter: Derek M Miller
>
> Currently, there seems to be an issue when using SASL in Spark with yarn with 
> at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my 
> configuration here: 
> http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn
>  .
> In short, I have added the spark.authenticate parameter to the hadoop and 
> spark configuration. I also have these parameters as well.
> ```
>   "spark.authenticate.enableSaslEncryption": "true",
>   "spark.network.sasl.serverAlwaysEncrypt": "true"
> ```
> However, I am consistently getting this error message:
> ```
> java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown 
> message type: -22
> ```
> Further debugging has not been helpful. I think it is worth noting that this 
> is all on Amazon's emr as well. As a side note, even if this is not a bug, I 
> think it would at the very least be worth updating the docs. The docs make it 
> seem like you only need to add 'spark.authenticate' to the spark config, 
> where it sounds like you actually need it for the hadoop configuration as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19079) Spark 1.6.1 SASL Error with Yarn

2017-01-04 Thread Derek M Miller (JIRA)
Derek M Miller created SPARK-19079:
--

 Summary: Spark 1.6.1 SASL Error with Yarn
 Key: SPARK-19079
 URL: https://issues.apache.org/jira/browse/SPARK-19079
 Project: Spark
  Issue Type: Bug
Reporter: Derek M Miller


Currently, there seems to be an issue when using SASL in Spark with yarn with 
at least 1.6.1. I wrote up a stackoverflow issue with the exact details of my 
configuration here: 
http://stackoverflow.com/questions/41453588/spark-sasl-not-working-on-the-emr-with-yarn
 .

In short, I have added the spark.authenticate parameter to the hadoop and spark 
configuration. I also have these parameters as well.

```
  "spark.authenticate.enableSaslEncryption": "true",
  "spark.network.sasl.serverAlwaysEncrypt": "true"
```

However, I am consistently getting this error message:

```
java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message 
type: -22
```

Further debugging has not been helpful. I think it is worth noting that this is 
all on Amazon's emr as well.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org