Spark preserve timestamp

2018-01-12 Thread sk skk
Do we have option to say to spark to preserve time stamp while creating


Timestamp changing while writing

2018-01-11 Thread sk skk

I am using createDataframe and passing java row rdd and schema . But it is
changing the time value when I write that data frame to a parquet file.

Can any one help .

Thank you,

Re: Custom line/record delimiter

2018-01-01 Thread sk skk
Thanks for the update Kwon.


On Mon, Jan 1, 2018 at 7:54 PM Hyukjin Kwon <> wrote:

> Hi,
> There's a PR - and JIRA
> - SPARK-21289
> Alternatively, you could check out multiLine option for CSV and see if
> applicable.
> Thanks.
> 2017-12-30 2:19 GMT+09:00 sk skk <>:
>> Hi,
>> Do we have an option to write a csv or text file with a custom
>> record/line separator through spark ?
>> I could not find any ref on the api. I have a issue while loading data
>> into a warehouse as one of the column on csv have a new line character and
>> the warehouse is not letting to escape that new line character .
>> Thank you ,
>> Sk

Custom line/record delimiter

2017-12-29 Thread sk skk

Do we have an option to write a csv or text file with a custom record/line
separator through spark ?

I could not find any ref on the api. I have a issue while loading data into
a warehouse as one of the column on csv have a new line character and the
warehouse is not letting to escape that new line character .

Thank you ,

Sparkcontext on udf

2017-10-18 Thread sk skk
I have registered a udf with sqlcontext , I am trying to read another
parquet using sqlcontext under same udf it’s throwing null pointer
exception .

Any help how to access sqlcontext inside a udf ?


Appending column to a parquet

2017-10-17 Thread sk skk
Hi ,

I have two parquet files with different schemas based on unique I have to
fetch one column value and append to all rows on the parquet file .

I tried join but I guess due to diff schema it’s not working . I can use
withcolumn but can we get single value of a column and assign it to a
literal as if I register it as a temp table and fetch that column value and
assigning it to a string it is return a row to string schema and not
getting a literal .

Is there a better way to handle this or how to get a literal value from
temporary table .

Thank you ,

Java Rdd of String to dataframe

2017-10-11 Thread sk skk
Can we create a dataframe from a Java pair rdd of String . I don’t have a
schema as it will be a dynamic Json. I gave encoders.string class.

Any help is appreciated !!


how to fetch schema froma dynamic nested JSON

2017-08-12 Thread sk skk

i have a requirement where i have to read a dynamic nested JSON for schema
and need to check the data quality based on the schema.

i.e i get the details from a JSON i.e say column 1 should be string, length
kinda... this is dynamic json and nested one. so traditionally i have to
loop the json object and fetch all the data.

Coming to data array i have to read a json array where each json object
should be checked with the above json schema i.e on the json array first
json object first column data should be string,lengthmatch .

With out looping schema json and inside that looping this data array which
will be performance impact, do we have any options or better way to handle..

Thanks in advance.