Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-18 Thread Jörn Franke
You have forgotten a y: It must be MM/did/ > On 17. Aug 2017, at 21:30, Aakash Basu wrote: > > Hi Palwell, > > Tried doing that, but its becoming null for all the dates after the > transformation with functions. > > df2 =

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-17 Thread Aakash Basu
Hi Palwell, Tried doing that, but its becoming null for all the dates after the transformation with functions. df2 = dflead.select('Enter_Date',f.to_date(df2.Enter_Date)) [image: Inline image 1] Any insight? Thanks, Aakash. On Fri, Aug 18, 2017 at 12:23 AM, Patrick Alwell

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-17 Thread Aakash Basu
Hey all, Thanks! I had a discussion with the person who authored that package and informed about this bug, but in the meantime with the same thing, found a small tweak to ensure the job is done. Now that is fine, I'm getting the date as a string by predefining the Schema but I want to later

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Jörn Franke
You can use Apache POI DateUtil to convert double to Date (https://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/DateUtil.html). Alternatively you can try HadoopOffice (https://github.com/ZuInnoTe/hadoopoffice/wiki), it supports Spark 1.x or Spark 2.0 ds. > On 16. Aug 2017, at 20:15,

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Aakash Basu
Hey Irving, Thanks for a quick revert. In Excel that column is purely string, I actually want to import that as a String and later play around the DF to convert it back to date type, but the API itself is not allowing me to dynamically assign a Schema to the DF and I'm forced to inferSchema,

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Irving Duran
I think there is a difference between the actual value in the cell and what Excel formats that cell. You probably want to import that field as a string or not have it as a date format in Excel. Just a thought Thank You, Irving Duran On Wed, Aug 16, 2017 at 12:47 PM, Aakash Basu

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Aakash Basu
Hey all, Forgot to attach the link to the overriding Schema through external package's discussion. https://github.com/crealytics/spark-excel/pull/13 You can see my comment there too. Thanks, Aakash. On Wed, Aug 16, 2017 at 11:11 PM, Aakash Basu wrote: > Hi all, >

Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Aakash Basu
Hi all, I am working on PySpark (*Python 3.6 and Spark 2.1.1*) and trying to fetch data from an excel file using *spark.read.format("com.crealytics.spark.excel")*, but it is inferring double for a date type column. The detailed description is given here (the question I posted) -