Re: Spark Shell issue on HDInsight

2017-05-14 Thread ayan guha
Oh ok Denny, great!!! Also, thanks for your effort in resolving my issue.

can I ask one more (more open ended) question? We have a requirement where
we want to read data from either Blob storage or Hive table, and upsert few
records in CosmosDB.

One option is run a C# activity on a windows batch pool. Other option is to
use spark.

Do you have any opinion on either? I know spark so I am little biased on
the second option and can think of benefits around it, but I want to be
sure I am not missing out any better solution.

Looking forward to hear your take.

Best
ayan



On Mon, May 15, 2017 at 8:24 AM, Denny Lee  wrote:

> Sorry for the delay, you just did as I'm with the Azure CosmosDB (formerly
> DocumentDB) team.  If you'd like to make it official, why not add an issue
> to the GitHub repo at https://github.com/Azure/
> azure-documentdb-spark/issues.  HTH!
>
> On Thu, May 11, 2017 at 9:08 PM ayan guha  wrote:
>
>> Works for me tooyou are a life-saver :)
>>
>> But the question: should/how we report this to Azure team?
>>
>> On Fri, May 12, 2017 at 10:32 AM, Denny Lee 
>> wrote:
>>
>>> I was able to repro your issue when I had downloaded the jars via blob
>>> but when I downloaded them as raw, I was able to get everything up and
>>> running.  For example:
>>>
>>> wget https://github.com/Azure/azure-documentdb-spark/*blob*/
>>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_
>>> 2.11/azure-documentdb-1.10.0.jar
>>> wget https://github.com/Azure/azure-documentdb-spark/*blob*/
>>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_
>>> 2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>>> spark-shell --master yarn --jars azure-documentdb-spark-0.0.3-
>>> SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>>
>>> resulted in the error:
>>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel).
>>> [init] error: error while loading , Error accessing
>>> /home/sshuser/jars/test/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>>>
>>> Failed to initialize compiler: object java.lang.Object in compiler
>>> mirror not found.
>>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>>> ** object programmatically, settings.usejavacp.value = true.
>>>
>>> But when running:
>>> wget https://github.com/Azure/azure-documentdb-spark/raw/
>>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_
>>> 2.11/azure-documentdb-1.10.0.jar
>>> wget https://github.com/Azure/azure-documentdb-spark/raw/
>>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_
>>> 2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>>> spark-shell --master yarn --jars azure-documentdb-spark-0.0.3-
>>> SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>>
>>> it was up and running:
>>> spark-shell --master yarn --jars azure-documentdb-spark-0.0.3-
>>> SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel).
>>> 17/05/11 22:54:06 WARN SparkContext: Use an existing SparkContext, some
>>> configuration may not take effect.
>>> Spark context Web UI available at http://10.0.0.22:4040
>>> Spark context available as 'sc' (master = yarn, app id =
>>> application_1494248502247_0013).
>>> Spark session available as 'spark'.
>>> Welcome to
>>>     __
>>>  / __/__  ___ _/ /__
>>> _\ \/ _ \/ _ `/ __/  '_/
>>>/___/ .__/\_,_/_/ /_/\_\   version 2.0.2.2.5.4.0-121
>>>   /_/
>>>
>>> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_121)
>>> Type in expressions to have them evaluated.
>>> Type :help for more information.
>>>
>>> scala>
>>>
>>> HTH!
>>>
>>>
>>> On Wed, May 10, 2017 at 11:49 PM ayan guha  wrote:
>>>
 Hi

 Thanks for reply, but unfortunately did not work. I am getting same
 error.

 sshuser@ed0-svochd:~/azure-spark-docdb-test$ spark-shell --jars
 azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
 SPARK_MAJOR_VERSION is set to 2, using Spark2
 Setting default log level to "WARN".
 To adjust logging level use sc.setLogLevel(newLevel).
 [init] error: error while loading , Error accessing
 /home/sshuser/azure-spark-docdb-test/azure-documentdb-
 spark-0.0.3-SNAPSHOT.jar

 Failed to initialize compiler: object java.lang.Object in compiler
 mirror not found.
 ** Note that as of 2.8 scala does not assume use of the java classpath.
 ** For the old behavior pass -usejavacp to scala, or if using a Settings
 ** object programmatically, settings.usejavacp.value = true.

 Failed to initialize compiler: object java.lang.Object in compiler
 mirror not found.
 ** Note that as of 2.8 scala does not assume use of the java classpath.
 

Re: Spark Shell issue on HDInsight

2017-05-14 Thread Denny Lee
Sorry for the delay, you just did as I'm with the Azure CosmosDB (formerly
DocumentDB) team.  If you'd like to make it official, why not add an issue
to the GitHub repo at https://github.com/Azure/azure-documentdb-spark/issues.
HTH!

On Thu, May 11, 2017 at 9:08 PM ayan guha  wrote:

> Works for me tooyou are a life-saver :)
>
> But the question: should/how we report this to Azure team?
>
> On Fri, May 12, 2017 at 10:32 AM, Denny Lee  wrote:
>
>> I was able to repro your issue when I had downloaded the jars via blob
>> but when I downloaded them as raw, I was able to get everything up and
>> running.  For example:
>>
>> wget https://github.com/Azure/azure-documentdb-spark/*blob*
>> /master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-1.10.0.jar
>> wget https://github.com/Azure/azure-documentdb-spark/*blob*
>> /master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>> spark-shell --master yarn --jars
>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>
>> resulted in the error:
>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>> Setting default log level to "WARN".
>> To adjust logging level use sc.setLogLevel(newLevel).
>> [init] error: error while loading , Error accessing
>> /home/sshuser/jars/test/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>>
>> Failed to initialize compiler: object java.lang.Object in compiler mirror
>> not found.
>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>> ** object programmatically, settings.usejavacp.value = true.
>>
>> But when running:
>> wget
>> https://github.com/Azure/azure-documentdb-spark/raw/master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-1.10.0.jar
>> wget
>> https://github.com/Azure/azure-documentdb-spark/raw/master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>> spark-shell --master yarn --jars
>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>
>> it was up and running:
>> spark-shell --master yarn --jars
>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>> Setting default log level to "WARN".
>> To adjust logging level use sc.setLogLevel(newLevel).
>> 17/05/11 22:54:06 WARN SparkContext: Use an existing SparkContext, some
>> configuration may not take effect.
>> Spark context Web UI available at http://10.0.0.22:4040
>> Spark context available as 'sc' (master = yarn, app id =
>> application_1494248502247_0013).
>> Spark session available as 'spark'.
>> Welcome to
>>     __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/___/ .__/\_,_/_/ /_/\_\   version 2.0.2.2.5.4.0-121
>>   /_/
>>
>> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_121)
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala>
>>
>> HTH!
>>
>>
>> On Wed, May 10, 2017 at 11:49 PM ayan guha  wrote:
>>
>>> Hi
>>>
>>> Thanks for reply, but unfortunately did not work. I am getting same
>>> error.
>>>
>>> sshuser@ed0-svochd:~/azure-spark-docdb-test$ spark-shell --jars
>>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel).
>>> [init] error: error while loading , Error accessing
>>> /home/sshuser/azure-spark-docdb-test/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>>>
>>> Failed to initialize compiler: object java.lang.Object in compiler
>>> mirror not found.
>>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>>> ** object programmatically, settings.usejavacp.value = true.
>>>
>>> Failed to initialize compiler: object java.lang.Object in compiler
>>> mirror not found.
>>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>>> ** object programmatically, settings.usejavacp.value = true.
>>> Exception in thread "main" java.lang.NullPointerException
>>> at
>>> scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256)
>>> at
>>> scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:896)
>>> at
>>> scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:895)
>>> at
>>> scala.tools.nsc.interpreter.IMain$Request.headerPreamble$lzycompute(IMain.scala:895)
>>> at
>>> scala.tools.nsc.interpreter.IMain$Request.headerPreamble(IMain.scala:895)
>>> at
>>> scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:918)
>>> at
>>> 

[PYTHON] PySpark typing hints

2017-05-14 Thread Maciej Szymkiewicz
Hi everyone,

For the last few months I've been working on static type annotations for
PySpark. For those of you, who are not familiar with the idea, typing
hints have been introduced by PEP 484
(https://www.python.org/dev/peps/pep-0484/) and further extended with
PEP 526 (https://www.python.org/dev/peps/pep-0526/) with the main goal
of providing information required for static analysis. Right now there a
few tools which support typing hints, including Mypy
(https://github.com/python/mypy) and PyCharm
(https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html). 
Type hints can be added using function annotations
(https://www.python.org/dev/peps/pep-3107/, Python 3 only), docstrings,
or source independent stub files
(https://www.python.org/dev/peps/pep-0484/#stub-files). Typing is
optional, gradual and has no runtime impact.

At this moment I've annotated majority of the API, including majority of
pyspark.sql and pyspark.ml. At this moment project is still rough around
the edges, and may result in both false positive and false negatives,
but I think it become mature enough to be useful in practice.

The current version is compatible only with Python 3, but it is
possible, with some limitations, to backport it to Python 2 (though it
is not on my todo list).

There is a number of possible benefits for PySpark users and developers:

  * Static analysis can detect a number of common mistakes to prevent
runtime failures. Generic self is still fairly limited, so it is
more useful with DataFrames, SS and ML than RDD, DStreams or RDD.
  * Annotations can be used for documenting complex signatures
(https://git.io/v95JN) including dependencies on arguments and value
(https://git.io/v95JA).
  * Detecting possible bugs in Spark (SPARK-20631) .
  * Showing API inconsistencies.

Roadmap

  * Update the project to reflect Spark 2.2.
  * Refine existing annotations.

If there will be enough interest I am happy to contribute this back to
Spark or submit to Typeshed (https://github.com/python/typeshed -  this
would require a formal ASF approval, and since Typeshed doesn't provide
versioning, is probably not the best option in our case).

Further inforamtion:

  * https://github.com/zero323/pyspark-stubs - GitHub repository

  * 
https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017
- interesting presentation by Marco Bonzanini

-- 
Best,
Maciej



signature.asc
Description: OpenPGP digital signature


Re: Spark <--> S3 flakiness

2017-05-14 Thread Gourav Sengupta
Are you running EMR?

On Sun, May 14, 2017 at 4:59 AM, Miguel Morales 
wrote:

> Some things just didn't work as i had first expected it.  For example,
> when writing from a spark collection to an alluxio destination didn't
> persist them to s3 automatically.
>
> I remember having to use the alluxio library directly to force the
> files to persist to s3 after spark finished writing to alluxio.
>
> On Fri, May 12, 2017 at 6:52 AM, Gene Pang  wrote:
> > Hi,
> >
> > Yes, you can use Alluxio with Spark to read/write to S3. Here is a blog
> post
> > on Spark + Alluxio + S3, and here is some documentation for configuring
> > Alluxio + S3 and configuring Spark + Alluxio.
> >
> > You mentioned that it required a lot of effort to get working. May I ask
> > what you ran into, and how you got it to work?
> >
> > Thanks,
> > Gene
> >
> > On Thu, May 11, 2017 at 11:55 AM, Miguel Morales <
> therevolti...@gmail.com>
> > wrote:
> >>
> >> Might want to try to use gzip as opposed to parquet.  The only way i
> >> ever reliably got parquet to work on S3 is by using Alluxio as a
> >> buffer, but it's a decent amount of work.
> >>
> >> On Thu, May 11, 2017 at 11:50 AM, lucas.g...@gmail.com
> >>  wrote:
> >> > Also, and this is unrelated to the actual question... Why don't these
> >> > messages show up in the archive?
> >> >
> >> > http://apache-spark-user-list.1001560.n3.nabble.com/
> >> >
> >> > Ideally I'd want to post a link to our internal wiki for these
> >> > questions,
> >> > but can't find them in the archive.
> >> >
> >> > On 11 May 2017 at 07:16, lucas.g...@gmail.com 
> >> > wrote:
> >> >>
> >> >> Looks like this isn't viable in spark 2.0.0 (and greater I presume).
> >> >> I'm
> >> >> pretty sure I came across this blog and ignored it due to that.
> >> >>
> >> >> Any other thoughts?  The linked tickets in:
> >> >> https://issues.apache.org/jira/browse/SPARK-10063
> >> >> https://issues.apache.org/jira/browse/HADOOP-13786
> >> >> https://issues.apache.org/jira/browse/HADOOP-9565 look relevant too.
> >> >>
> >> >> On 10 May 2017 at 22:24, Miguel Morales 
> >> >> wrote:
> >> >>>
> >> >>> Try using the DirectParquetOutputCommiter:
> >> >>> http://dev.sortable.com/spark-directparquetoutputcommitter/
> >> >>>
> >> >>> On Wed, May 10, 2017 at 10:07 PM, lucas.g...@gmail.com
> >> >>>  wrote:
> >> >>> > Hi users, we have a bunch of pyspark jobs that are using S3 for
> >> >>> > loading
> >> >>> > /
> >> >>> > intermediate steps and final output of parquet files.
> >> >>> >
> >> >>> > We're running into the following issues on a semi regular basis:
> >> >>> > * These are intermittent errors, IE we have about 300 jobs that
> run
> >> >>> > nightly... And a fairly random but small-ish percentage of them
> fail
> >> >>> > with
> >> >>> > the following classes of errors.
> >> >>> >
> >> >>> > S3 write errors
> >> >>> >
> >> >>> >> "ERROR Utils: Aborting task
> >> >>> >> com.amazonaws.services.s3.model.AmazonS3Exception: Status Code:
> >> >>> >> 404,
> >> >>> >> AWS
> >> >>> >> Service: Amazon S3, AWS Request ID: 2D3RP, AWS Error Code: null,
> >> >>> >> AWS
> >> >>> >> Error
> >> >>> >> Message: Not Found, S3 Extended Request ID: BlaBlahEtc="
> >> >>> >
> >> >>> >
> >> >>> >>
> >> >>> >> "Py4JJavaError: An error occurred while calling o43.parquet.
> >> >>> >> : com.amazonaws.services.s3.model.MultiObjectDeleteException:
> >> >>> >> Status
> >> >>> >> Code:
> >> >>> >> 0, AWS Service: null, AWS Request ID: null, AWS Error Code: null,
> >> >>> >> AWS
> >> >>> >> Error
> >> >>> >> Message: One or more objects could not be deleted, S3 Extended
> >> >>> >> Request
> >> >>> >> ID:
> >> >>> >> null"
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > S3 Read Errors:
> >> >>> >
> >> >>> >> [Stage 1:=>
> >> >>> >> (27
> >> >>> >> + 4)
> >> >>> >> / 31]17/05/10 16:25:23 ERROR Executor: Exception in task 10.0 in
> >> >>> >> stage
> >> >>> >> 1.0
> >> >>> >> (TID 11)
> >> >>> >> java.net.SocketException: Connection reset
> >> >>> >> at java.net.SocketInputStream.read(SocketInputStream.java:196)
> >> >>> >> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> >> >>> >> at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
> >> >>> >> at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:
> 554)
> >> >>> >> at sun.security.ssl.InputRecord.read(InputRecord.java:509)
> >> >>> >> at
> >> >>> >> sun.security.ssl.SSLSocketImpl.readRecord(
> SSLSocketImpl.java:927)
> >> >>> >> at
> >> >>> >>
> >> >>> >> sun.security.ssl.SSLSocketImpl.readDataRecord(
> SSLSocketImpl.java:884)
> >> >>> >> at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
> >> >>> >> at
> >> >>> >>
> >> >>> >>
> >> >>> >> org.apache.http.impl.io.AbstractSessionInputBuffer.read(
> AbstractSessionInputBuffer.java:198)
> >> >>> >> at
> >> >>> >>
> >> >>> >>
> >> >>> >> 

Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException

2017-05-14 Thread fattahsafa
I'm trying to insert data into Cassandra table with Spark SQL as follows:

String query = "CREATE TEMPORARY TABLE my_table USING
org.apache.spark.sql.cassandra OPTIONS (table \"my_table\",keyspace
\"my_keyspace\", pushdown \"true\")";
spark.sparkSession.sql(query);
spark.sparkSession.sql("INSERT INTO my_keyspace.my_table
(column0, column1) VALUES ('value0', 'value1');

however, it fails with the following exception:
Exception in thread "main"
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'column0' expecting {'(', 'SELECT', 'FROM', 'VALUES',
'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 33)

I tried it without the column names and it worked.
My point here is to insert data for some columns, not all of them.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-Simple-Insert-Statement-using-Spark-SQL-Fails-with-org-apache-spark-sql-catalyst-parser-Pan-tp28682.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: Spark SQL DataFrame to Kafka Topic

2017-05-14 Thread Revin Chalil
Hi TD / Michael,


I am trying to use the foreach sink to write to Kafka and followed 
this
 from DBricks blog by Sunil 
Sitaula . I get the below 
with DF.writeStream.foreach(writer).outputMode("update").start() when using a 
simple DF

Type mismatch, expected: foreachWriter[Row], actual: KafkaSink

Cannot resolve reference foreach with such signature



Below is the snippet

val data = session
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", KafkaBroker)
  .option("subscribe", InTopic)
  .load()
  .select($"value".as[Array[Byte]])
  .flatMap(d => {
var events = AvroHelper.readEvents(d)
events.map((event: HdfsEvent) => {
  var payload = EventPayloadParser.read(event.getPayload)
  new KafkaMessage(payload)
})
  })



case class KafkaMessage(
  payload: String)



This is where I use the foreach

val writer = new KafkaSink("kafka-topic", KafkaBroker)
val query = data.writeStream.foreach(writer).outputMode("update").start()



In this case, it shows –

Type mismatch, expected: foreachWriter[Main.KafkaMesage], actual: Main.KafkaSink

Cannot resolve reference foreach with such signature



Any help is much appreciated. Thank you.


From: Tathagata Das [mailto:tathagata.das1...@gmail.com]
Sent: Friday, January 13, 2017 3:31 PM
To: Koert Kuipers 
Cc: Peyman Mohajerian ; Senthil Kumar 
; User ; senthilec...@apache.org
Subject: Re: Spark SQL DataFrame to Kafka Topic

Structured Streaming has a foreach sink, where you can essentially do what you 
want with your data. Its easy to create a Kafka producer, and write the data 
out to kafka.
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach

On Fri, Jan 13, 2017 at 8:28 AM, Koert Kuipers 
> wrote:
how do you do this with structured streaming? i see no mention of writing to 
kafka

On Fri, Jan 13, 2017 at 10:30 AM, Peyman Mohajerian 
> wrote:
Yes, it is called Structured Streaming: 
https://docs.databricks.com/_static/notebooks/structured-streaming-kafka.html
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar 
> wrote:
Hi Team ,

 Sorry if this question already asked in this forum..

Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??

Here is my Code which Reads Parquet File :


val sqlContext = new org.apache.spark.sql.SQLContext(sc);

val df = sqlContext.read.parquet("/temp/*.parquet")

df.registerTempTable("beacons")



I want to directly ingest df DataFrame to Kafka ! Is there any way to achieve 
this ??



Cheers,

Senthil





Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException

2017-05-14 Thread Abdulfattah Safa
I'm trying to insert data into Cassandra table with Spark SQL as follows:

String query = "CREATE TEMPORARY TABLE my_table USING
org.apache.spark.sql.cassandra OPTIONS (table \"my_table\",keyspace
\"my_keyspace\", pushdown \"true\")";
spark.sparkSession.sql(query);
spark.sparkSession.sql("INSERT INTO
my_keyspace.my_table (column0, column1) VALUES ('value0', 'value1');

however, it fails with the following exception:
Exception in thread "main"
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'column0' expecting {'(', 'SELECT', 'FROM', 'VALUES',
'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 33)

I tried it without the column names and it worked.
My point here is to insert data for some columns, not all of them.


Cassandra Simple Insert Statement using Spark SQL Fails with org.apache.spark.sql.catalyst.parser.ParseException

2017-05-14 Thread Abdulfattah Safa
I'm trying to insert data into Cassandra table with Spark SQL as follows:

String query = "CREATE TEMPORARY TABLE my_table USING
org.apache.spark.sql.cassandra OPTIONS (table \"my_table\",keyspace
\"my_keyspace\", pushdown \"true\")";
spark.sparkSession.sql(query);
spark.sparkSession.sql("INSERT INTO
my_keyspace.my_table (column0, column1) VALUES ('value0', 'value1');

however, it fails with the following exception:
Exception in thread "main"
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'column0' expecting {'(', 'SELECT', 'FROM', 'VALUES',
'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 33)

I tried it without the column names and it worked.
My point here is to insert data for some columns, not all of them.