Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the
Driver Memory as I got OOM in t he driver thread. If found that when
setting  the Driver Memory to > Executor Memory, the submitted job is stuck
at Submitted in the driver and the application never starts.


Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-04 Thread Abdulfattah Safa
I'm working on Spark with Standalone Cluster mode. I need to increase the
Driver Memory as I got OOM in t he driver thread. If found that when
setting  the Driver Memory to > Executor Memory, the submitted job is stuck
at Submitted in the driver and the application never starts.


Re: Using SparkContext in Executors

2017-05-28 Thread Abdulfattah Safa
So I can't run SQL queries in Executors ?

On Sun, May 28, 2017 at 11:00 PM Mark Hamstra 
wrote:

> You can't do that. SparkContext and SparkSession can exist only on the
> Driver.
>
> On Sun, May 28, 2017 at 6:56 AM, Abdulfattah Safa 
> wrote:
>
>> How can I use SparkContext (to create Spark Session or Cassandra
>> Sessions) in executors?
>> If I pass it as parameter to the foreach or foreachpartition, then it
>> will have a null value.
>> Shall I create a new SparkContext in each executor?
>>
>> Here is what I'm trying to do:
>> Read a dump directory with millions of dump files as follows:
>>
>> dumpFiles = Directory.listFiles(dumpDirectory)
>> dumpFilesRDD = sparkContext.parallize(dumpFiles, numOfSlices)
>> dumpFilesRDD.foreachPartition(dumpFilePath->parse(dumpFilePath))
>> .
>> .
>> .
>>
>> In parse(), each dump file is parsed and inserted into database using
>> SparlSQL. In order to do that, SparkContext is needed in the function parse
>> to use the sql() method.
>>
>
>


Using SparkContext in Executors

2017-05-28 Thread Abdulfattah Safa
How can I use SparkContext (to create Spark Session or Cassandra Sessions)
in executors?
If I pass it as parameter to the foreach or foreachpartition, then it will
have a null value.
Shall I create a new SparkContext in each executor?

Here is what I'm trying to do:
Read a dump directory with millions of dump files as follows:

dumpFiles = Directory.listFiles(dumpDirectory)
dumpFilesRDD = sparkContext.parallize(dumpFiles, numOfSlices)
dumpFilesRDD.foreachPartition(dumpFilePath->parse(dumpFilePath))
.
.
.

In parse(), each dump file is parsed and inserted into database using
SparlSQL. In order to do that, SparkContext is needed in the function parse
to use the sql() method.


Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException

2017-05-14 Thread Abdulfattah Safa
I'm trying to insert data into Cassandra table with Spark SQL as follows:

String query = "CREATE TEMPORARY TABLE my_table USING
org.apache.spark.sql.cassandra OPTIONS (table \"my_table\",keyspace
\"my_keyspace\", pushdown \"true\")";
spark.sparkSession.sql(query);
spark.sparkSession.sql("INSERT INTO
my_keyspace.my_table (column0, column1) VALUES ('value0', 'value1');

however, it fails with the following exception:
Exception in thread "main"
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'column0' expecting {'(', 'SELECT', 'FROM', 'VALUES',
'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 33)

I tried it without the column names and it worked.
My point here is to insert data for some columns, not all of them.


Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException

2017-05-14 Thread Abdulfattah Safa
I'm trying to insert data into Cassandra table with Spark SQL as follows:

String query = "CREATE TEMPORARY TABLE my_table USING
org.apache.spark.sql.cassandra OPTIONS (table \"my_table\",keyspace
\"my_keyspace\", pushdown \"true\")";
spark.sparkSession.sql(query);
spark.sparkSession.sql("INSERT INTO
my_keyspace.my_table (column0, column1) VALUES ('value0', 'value1');

however, it fails with the following exception:
Exception in thread "main"
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'column0' expecting {'(', 'SELECT', 'FROM', 'VALUES',
'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 33)

I tried it without the column names and it worked.
My point here is to insert data for some columns, not all of them.