save a histogram to a file

2015-01-22 Thread SK
Hi, histogram() returns an object that is a pair of Arrays. There appears to be no saveAsTextFile() for this paired object. Currently I am using the following to save the output to a file: val hist = a.histogram(10) val arr1 = sc.parallelize(hist._1).saveAsTextFile(file1) val arr2 =

filter expression in API document for DataFrame

2015-03-24 Thread SK
The following statement appears in the Scala API example at https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame people.filter(age 30). I tried this example and it gave a compilation error. I think this needs to be changed to people.filter(people(age) 30)

column expression in left outer join for DataFrame

2015-03-24 Thread SK
Hi, I am trying to port some code that was working in Spark 1.2.0 on the latest version, Spark 1.3.0. This code involves a left outer join between two SchemaRDDs which I am now trying to change to a left outer join between 2 DataFrames. I followed the example for left outer join of DataFrame at

RandomForest in Pyspark (version 1.4.1)

2015-07-31 Thread SK
Hi, I tried to develop a RandomForest model for my data in PySpark as follows: rf_model = RandomForest.trainClassifier(train_idf, 2, {}, numTrees=15, seed=144) print RF: Num trees = %d, Num nodes = %d\n %(rf_model.numTrees(), rf_model.totalNumNodes()) pred_label =

Parsing nested json objects with variable structure

2015-08-31 Thread SK
Hi, I need to parse a json input file where the nested objects take on a different structure based on the typeId field, as follows: { "d": { "uid" : "12345" "contents": [{"info": {"eventId": "event1"}, "typeId": 19}] } } { "d": { "uid" : "56780"

how to fetch schema froma dynamic nested JSON

2017-08-12 Thread sk skk
this data array which will be performance impact, do we have any options or better way to handle.. Thanks in advance. sk

Java Rdd of String to dataframe

2017-10-11 Thread sk skk
Can we create a dataframe from a Java pair rdd of String . I don’t have a schema as it will be a dynamic Json. I gave encoders.string class. Any help is appreciated !! Thanks, SK

Appending column to a parquet

2017-10-17 Thread sk skk
it to a literal as if I register it as a temp table and fetch that column value and assigning it to a string it is return a row to string schema and not getting a literal . Is there a better way to handle this or how to get a literal value from temporary table . Thank you , Sk

Sparkcontext on udf

2017-10-18 Thread sk skk
I have registered a udf with sqlcontext , I am trying to read another parquet using sqlcontext under same udf it’s throwing null pointer exception . Any help how to access sqlcontext inside a udf ? Regards, Sk

Custom line/record delimiter

2017-12-29 Thread sk skk
that new line character . Thank you , Sk

Spark preserve timestamp

2018-01-12 Thread sk skk
Do we have option to say to spark to preserve time stamp while creating struct. Regards, Sudhir

Timestamp changing while writing

2018-01-11 Thread sk skk
Hello, I am using createDataframe and passing java row rdd and schema . But it is changing the time value when I write that data frame to a parquet file. Can any one help . Thank you, Sudhir

Re: Custom line/record delimiter

2018-01-01 Thread sk skk
CSV and see if > applicable. > > > Thanks. > > > 2017-12-30 2:19 GMT+09:00 sk skk <spark.s...@gmail.com>: > >> Hi, >> >> Do we have an option to write a csv or text file with a custom >> record/line separator through spark ? >> >> I co

<    1   2