RE: JDBC Dialect for saving DataFrame into Vertica Table

2016-05-26 Thread Mohammed Guller
Vertica also provides a Spark connector. It was not GA the last time I looked 
at it, but available on the Vertica community site. Have you tried using the 
Vertica Spark connector instead of the JDBC driver?

Mohammed
Author: Big Data Analytics with 
Spark

From: Aaron Ilovici [mailto:ailov...@wayfair.com]
Sent: Thursday, May 26, 2016 8:08 AM
To: u...@spark.apache.org; dev@spark.apache.org
Subject: JDBC Dialect for saving DataFrame into Vertica Table

I am attempting to write a DataFrame of Rows to Vertica via DataFrameWriter's 
jdbc function in the following manner:

dataframe.write().mode(SaveMode.Append).jdbc(url, table, properties);

This works when there are no NULL values in any of the Rows in my DataFrame. 
However, when there are rows, I get the following error:

ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
java.sql.SQLFeatureNotSupportedException: [Vertica][JDBC](10220) Driver not 
capable.
at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown 
Source)
at com.vertica.jdbc.common.SPreparedStatement.setNull(Unknown Source)

This appears to be Spark's attempt to set a null value in a PreparedStatement, 
but Vertica does not understand the type upon executing the transaction. I see 
in JdbcDialects.scala that there are dialects for MySQL, Postgres, DB2, 
MsSQLServer, Derby, and Oracle.

1 - Would writing a dialect for Vertica eleviate the issue, by setting a 'NULL' 
in a type that Vertica would understand?
2 - What would be the best way to do this without a Spark patch? Scala, Java, 
make a jar and call 'JdbcDialects.registerDialect(VerticaDialect)' once created?
3 - Where would one find the proper mapping between Spark DataTypes and Vertica 
DataTypes? I don't see 'NULL' handling for any of the dialects, only the base 
case 'case _ => None' - is None mapped to the proper NULL type elsewhere?

My environment: Spark 1.6, Vertica Driver 7.2.2, Java 1.7

I would be happy to create a Jira and submit a pull request with the 
VerticaDialect once I figure this out.

Thank you for any insight on this,

AARON ILOVICI
Software Engineer
Marketing Engineering

[cid:image001.png@01D1B760.973BD800]

WAYFAIR
4 Copley Place
Boston, MA 02116
(617) 532-6100 x1231
ailov...@wayfair.com




Request to add a new book to the Books section on Spark's website

2016-03-09 Thread Mohammed Guller
My book on Spark was recently published. I would like to request it to be added 
to the Books section on Spark's website.

Here are the details about the book.

Title: Big Data Analytics with Spark
Author: Mohammed Guller
Link: 
www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

Brief Description:
This book is a hands-on guide for learning how to use Spark for different types 
of analytics, including batch, interactive, graph, and stream data analytics as 
well as machine learning. It covers Spark core, Spark SQL, DataFrames, Spark 
Streaming, GraphX, MLlib, and Spark ML. Plenty of examples are provided for the 
readers to practice with.

In addition to covering Spark in depth, the book provides an introduction to 
other big data technologies that are commonly used along with Spark, such as 
HDFS, Parquet, Kafka, Avro, Cassandra, HBase, Mesos, and YARN. The book also 
includes a primer on functional programming and Scala.

Please let me know if you need any other information.

Thanks,
Mohammed




looking for a technical reviewer to review a book on Spark

2015-09-09 Thread Mohammed Guller
Hi Spark developers,

I am writing a book on Spark. The publisher of the book is looking for a 
technical reviewer. You will be compensated for your time. The publisher will 
pay a flat rate per page for the review.

I spoke with Matei Zaharia about this and he suggested that I send an email to 
the dev mailing list.

The book covers Spark core and the Spark libraries, including Spark SQL, Spark 
Streaming, MLlib, Spark ML, and GraphX. It also covers operational aspects such 
as deployment with different cluster managers and monitoring.

Please let me know if you are interested and I will connect you with the 
publisher.

Thanks,
Mohammed

Principal Architect, Glassbeam Inc, 
www.glassbeam.com,
5201 Great America Parkway, Suite 360, Santa Clara, CA 95054
p: +1.408.740.4610, m: 
+1.925.786.7521, f: 
+1.408.740.4601, skype : mguller