RE: Spark and SQL Server

Young, Matthew T Mon, 20 Jul 2015 09:17:05 -0700

When attempting to write a Dataframe to SQL Server that contains 
java.sql.Timestamp or java.lang.boolean objects I get errors about the query 
that is formed being invalid. Specifically, java.sql.Timestamp objects try to 
be written as the Timestamp type, which is not appropriate for date/time 
storage in TSQL. They should be datetimes or datetime2s.


java.lang.boolean errors out because it tries to specify the width of the BIT 
field, which SQL Server doesn't like.

However, I can write strings/varchars and ints without any issues.


________________________________________
From: Davies Liu [dav...@databricks.com]
Sent: Monday, July 20, 2015 9:08 AM
To: Young, Matthew T
Cc: user@spark.apache.org
Subject: Re: Spark and SQL Server

Sorry for the confusing. What's the other issues?

On Mon, Jul 20, 2015 at 8:26 AM, Young, Matthew T
<matthew.t.yo...@intel.com> wrote:
> Thanks Davies, that resolves the issue with Python.
>
> I was using the Java/Scala DataFrame documentation 
> <https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrameWriter.html>
>  and assuming that it was the same for PySpark< 
> http://spark.apache.org/docs/1.4.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter>.
>  I will keep this distinction in mind going forward.
>
> I guess we have to wait for Microsoft to release an SQL Server connector for 
> Spark to resolve the other issues.
>
> Cheers,
>
> -- Matthew Young
>
> ________________________________________
> From: Davies Liu [dav...@databricks.com]
> Sent: Saturday, July 18, 2015 12:45 AM
> To: Young, Matthew T
> Cc: user@spark.apache.org
> Subject: Re: Spark and SQL Server
>
> I think you have a mistake on call jdbc(), it should be:
>
> jdbc(self, url, table, mode, properties)
>
> You had use properties as the third parameter.
>
> On Fri, Jul 17, 2015 at 10:15 AM, Young, Matthew T
> <matthew.t.yo...@intel.com> wrote:
>> Hello,
>>
>> I am testing Spark interoperation with SQL Server via JDBC with Microsoft’s 
>> 4.2 JDBC Driver. Reading from the database works ok, but I have encountered 
>> a couple of issues writing back. In Scala 2.10 I can write back to the 
>> database except for a couple of types.
>>
>>
>> 1.      When I read a DataFrame from a table that contains a datetime column 
>> it comes in as a java.sql.Timestamp object in the DataFrame. This is alright 
>> for Spark purposes, but when I go to write this back to the database with 
>> df.write.jdbc(…) it errors out because it is trying to write the TimeStamp 
>> type to SQL Server, which is not a date/time storing type in TSQL. I think 
>> it should be writing a datetime, but I’m not sure how to tell Spark this.
>>
>>
>>
>> 2.      A related misunderstanding happens when I try to write a 
>> java.lang.boolean to the database; it errors out because Spark is trying to 
>> specify the width of the bit type, which is illegal in SQL Server (error 
>> msg: Cannot specify a column width on data type bit). Do I need to edit 
>> Spark source to fix this behavior, or is there a configuration option 
>> somewhere that I am not aware of?
>>
>>
>> When I attempt to write back to SQL Server in an IPython notebook, py4j 
>> seems unable to convert a Python dict into a Java hashmap, which is 
>> necessary for parameter passing. I’ve documented details of this problem 
>> with code examples 
>> here<http://stackoverflow.com/questions/31417653/java-util-hashmap-missing-in-pyspark-session>.
>>  Any advice would be appreciated.
>>
>> Thank you for your time,
>>
>> -- Matthew Young
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: Spark and SQL Server

Reply via email to