Spark preserve timestamp

2018-01-12 Thread sk skk
Do we have option to say to spark to preserve time stamp while creating struct. Regards, Sudhir

Timestamp changing while writing

2018-01-11 Thread sk skk
Hello, I am using createDataframe and passing java row rdd and schema . But it is changing the time value when I write that data frame to a parquet file. Can any one help . Thank you, Sudhir

Re: Custom line/record delimiter

2018-01-01 Thread sk skk
CSV and see if > applicable. > > > Thanks. > > > 2017-12-30 2:19 GMT+09:00 sk skk <spark.s...@gmail.com>: > >> Hi, >> >> Do we have an option to write a csv or text file with a custom >> record/line separator through spark ? >> >> I co

Custom line/record delimiter

2017-12-29 Thread sk skk
Hi, Do we have an option to write a csv or text file with a custom record/line separator through spark ? I could not find any ref on the api. I have a issue while loading data into a warehouse as one of the column on csv have a new line character and the warehouse is not letting to escape that

Sparkcontext on udf

2017-10-18 Thread sk skk
I have registered a udf with sqlcontext , I am trying to read another parquet using sqlcontext under same udf it’s throwing null pointer exception . Any help how to access sqlcontext inside a udf ? Regards, Sk

Appending column to a parquet

2017-10-17 Thread sk skk
Hi , I have two parquet files with different schemas based on unique I have to fetch one column value and append to all rows on the parquet file . I tried join but I guess due to diff schema it’s not working . I can use withcolumn but can we get single value of a column and assign it to a

Java Rdd of String to dataframe

2017-10-11 Thread sk skk
Can we create a dataframe from a Java pair rdd of String . I don’t have a schema as it will be a dynamic Json. I gave encoders.string class. Any help is appreciated !! Thanks, SK

how to fetch schema froma dynamic nested JSON

2017-08-12 Thread sk skk
Hi, i have a requirement where i have to read a dynamic nested JSON for schema and need to check the data quality based on the schema. i.e i get the details from a JSON i.e say column 1 should be string, length kinda... this is dynamic json and nested one. so traditionally i have to loop the