Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Jörn Franke
You can use Apache POI DateUtil to convert double to Date (https://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/DateUtil.html). Alternatively you can try HadoopOffice (https://github.com/ZuInnoTe/hadoopoffice/wiki), it supports Spark 1.x or Spark 2.0 ds. > On 16. Aug 2017, at 20:15,

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Aakash Basu
Hey Irving, Thanks for a quick revert. In Excel that column is purely string, I actually want to import that as a String and later play around the DF to convert it back to date type, but the API itself is not allowing me to dynamically assign a Schema to the DF and I'm forced to inferSchema,

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Irving Duran
I think there is a difference between the actual value in the cell and what Excel formats that cell. You probably want to import that field as a string or not have it as a date format in Excel. Just a thought Thank You, Irving Duran On Wed, Aug 16, 2017 at 12:47 PM, Aakash Basu

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Aakash Basu
Hey all, Forgot to attach the link to the overriding Schema through external package's discussion. https://github.com/crealytics/spark-excel/pull/13 You can see my comment there too. Thanks, Aakash. On Wed, Aug 16, 2017 at 11:11 PM, Aakash Basu wrote: > Hi all, >

Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Aakash Basu
Hi all, I am working on PySpark (*Python 3.6 and Spark 2.1.1*) and trying to fetch data from an excel file using *spark.read.format("com.crealytics.spark.excel")*, but it is inferring double for a date type column. The detailed description is given here (the question I posted) -

Re: Restart streaming query spark 2.1 structured streaming

2017-08-16 Thread purna pradeep
And also is query.stop() is graceful stop operation?what happens to already received data will it be processed ? On Tue, Aug 15, 2017 at 7:21 PM purna pradeep wrote: > Ok thanks > > Few more > > 1.when I looked into the documentation it says onQueryprogress is not >

Reading parquet file in stream

2017-08-16 Thread HARSH TAKKAR
Hi I want to read a hdfs directory which contains parquet files, how can i stream data from this directory using streaming context (ssc.fileStream) ? Harsh

Thrift-Server JDBC ResultSet Cursor Reset or Previous

2017-08-16 Thread Imran Rajjad
Dear List, Are there any future plans to implement cursor reset or previous record functionality in Thrift Server`s JDBC driver? Are there any other alternatives? java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HiveBaseResultSet.previous(HiveBaseResultSet.java:643) regards

Re: Reading CSV with multiLine option invalidates encoding option.

2017-08-16 Thread Takeshi Yamamuro
Hi, Since the csv source currently supports ascii-compatible charset, so I guess shift-jis also works well. You could check Hyukjin's comment in https://issues.apache.org/jira/browse/SPARK-21289 for more info. On Wed, Aug 16, 2017 at 2:54 PM, Han-Cheol Cho wrote: > My