Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory
I'm working on Spark with Standalone Cluster mode. I need to increase the Driver Memory as I got OOM in t he driver thread. If found that when setting the Driver Memory to > Executor Memory, the submitted job is stuck at Submitted in the driver and the application never starts.
Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory
I'm working on Spark with Standalone Cluster mode. I need to increase the Driver Memory as I got OOM in t he driver thread. If found that when setting the Driver Memory to > Executor Memory, the submitted job is stuck at Submitted in the driver and the application never starts.
Re: Using SparkContext in Executors
So I can't run SQL queries in Executors ? On Sun, May 28, 2017 at 11:00 PM Mark Hamstra wrote: > You can't do that. SparkContext and SparkSession can exist only on the > Driver. > > On Sun, May 28, 2017 at 6:56 AM, Abdulfattah Safa > wrote: > >> How can I use SparkContext (to create Spark Session or Cassandra >> Sessions) in executors? >> If I pass it as parameter to the foreach or foreachpartition, then it >> will have a null value. >> Shall I create a new SparkContext in each executor? >> >> Here is what I'm trying to do: >> Read a dump directory with millions of dump files as follows: >> >> dumpFiles = Directory.listFiles(dumpDirectory) >> dumpFilesRDD = sparkContext.parallize(dumpFiles, numOfSlices) >> dumpFilesRDD.foreachPartition(dumpFilePath->parse(dumpFilePath)) >> . >> . >> . >> >> In parse(), each dump file is parsed and inserted into database using >> SparlSQL. In order to do that, SparkContext is needed in the function parse >> to use the sql() method. >> > >
Using SparkContext in Executors
How can I use SparkContext (to create Spark Session or Cassandra Sessions) in executors? If I pass it as parameter to the foreach or foreachpartition, then it will have a null value. Shall I create a new SparkContext in each executor? Here is what I'm trying to do: Read a dump directory with millions of dump files as follows: dumpFiles = Directory.listFiles(dumpDirectory) dumpFilesRDD = sparkContext.parallize(dumpFiles, numOfSlices) dumpFilesRDD.foreachPartition(dumpFilePath->parse(dumpFilePath)) . . . In parse(), each dump file is parsed and inserted into database using SparlSQL. In order to do that, SparkContext is needed in the function parse to use the sql() method.
Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException
I'm trying to insert data into Cassandra table with Spark SQL as follows: String query = "CREATE TEMPORARY TABLE my_table USING org.apache.spark.sql.cassandra OPTIONS (table \"my_table\",keyspace \"my_keyspace\", pushdown \"true\")"; spark.sparkSession.sql(query); spark.sparkSession.sql("INSERT INTO my_keyspace.my_table (column0, column1) VALUES ('value0', 'value1'); however, it fails with the following exception: Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'column0' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 33) I tried it without the column names and it worked. My point here is to insert data for some columns, not all of them.
Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException
I'm trying to insert data into Cassandra table with Spark SQL as follows: String query = "CREATE TEMPORARY TABLE my_table USING org.apache.spark.sql.cassandra OPTIONS (table \"my_table\",keyspace \"my_keyspace\", pushdown \"true\")"; spark.sparkSession.sql(query); spark.sparkSession.sql("INSERT INTO my_keyspace.my_table (column0, column1) VALUES ('value0', 'value1'); however, it fails with the following exception: Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'column0' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 33) I tried it without the column names and it worked. My point here is to insert data for some columns, not all of them.