Spark preserve timestamp

2018-01-12 Thread sk skk
Do we have option to say to spark to preserve time stamp while creating
struct.

Regards,
Sudhir


Timestamp changing while writing

2018-01-11 Thread sk skk
Hello,

I am using createDataframe and passing java row rdd and schema . But it is
changing the time value when I write that data frame to a parquet file.

Can any one help .

Thank you,
Sudhir


Re: Custom line/record delimiter

2018-01-01 Thread sk skk
Thanks for the update Kwon.

Regards,


On Mon, Jan 1, 2018 at 7:54 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Hi,
>
>
> There's a PR - https://github.com/apache/spark/pull/18581 and JIRA
> - SPARK-21289
>
> Alternatively, you could check out multiLine option for CSV and see if
> applicable.
>
>
> Thanks.
>
>
> 2017-12-30 2:19 GMT+09:00 sk skk <spark.s...@gmail.com>:
>
>> Hi,
>>
>> Do we have an option to write a csv or text file with a custom
>> record/line separator through spark ?
>>
>> I could not find any ref on the api. I have a issue while loading data
>> into a warehouse as one of the column on csv have a new line character and
>> the warehouse is not letting to escape that new line character .
>>
>> Thank you ,
>> Sk
>>
>
>


Custom line/record delimiter

2017-12-29 Thread sk skk
Hi,

Do we have an option to write a csv or text file with a custom record/line
separator through spark ?

I could not find any ref on the api. I have a issue while loading data into
a warehouse as one of the column on csv have a new line character and the
warehouse is not letting to escape that new line character .

Thank you ,
Sk


Sparkcontext on udf

2017-10-18 Thread sk skk
I have registered a udf with sqlcontext , I am trying to read another
parquet using sqlcontext under same udf it’s throwing null pointer
exception .

Any help how to access sqlcontext inside a udf ?

Regards,
Sk


Appending column to a parquet

2017-10-17 Thread sk skk
Hi ,

I have two parquet files with different schemas based on unique I have to
fetch one column value and append to all rows on the parquet file .

I tried join but I guess due to diff schema it’s not working . I can use
withcolumn but can we get single value of a column and assign it to a
literal as if I register it as a temp table and fetch that column value and
assigning it to a string it is return a row to string schema and not
getting a literal .

Is there a better way to handle this or how to get a literal value from
temporary table .


Thank you ,
Sk


Java Rdd of String to dataframe

2017-10-11 Thread sk skk
Can we create a dataframe from a Java pair rdd of String . I don’t have a
schema as it will be a dynamic Json. I gave encoders.string class.

Any help is appreciated !!

Thanks,
SK


how to fetch schema froma dynamic nested JSON

2017-08-12 Thread sk skk
Hi,

i have a requirement where i have to read a dynamic nested JSON for schema
and need to check the data quality based on the schema.

i.e i get the details from a JSON i.e say column 1 should be string, length
kinda... this is dynamic json and nested one. so traditionally i have to
loop the json object and fetch all the data.

Coming to data array i have to read a json array where each json object
should be checked with the above json schema i.e on the json array first
json object first column data should be string,lengthmatch .

With out looping schema json and inside that looping this data array which
will be performance impact, do we have any options or better way to handle..


Thanks in advance.
sk