Sparkcontext on udf

2017-10-18 Thread sk skk
I have registered a udf with sqlcontext , I am trying to read another parquet using sqlcontext under same udf it’s throwing null pointer exception . Any help how to access sqlcontext inside a udf ? Regards, Sk

Re: Spark streaming for CEP

2017-10-18 Thread Mich Talebzadeh
As you may be aware the granularity that Spark streaming has is micro-batching and that is limited to 0.5 second. So if you have continuous ingestion of data then Spark streaming may not be granular enough for CEP. You may consider other products. Worth looking at this old thread on mine "Spark

Spark streaming for CEP

2017-10-18 Thread anna stax
Hello all, Has anyone used spark streaming for CEP (Complex Event processing). Any CEP libraries that works well with spark. I have a use case for CEP and trying to see if spark streaming is a good fit. Currently we have a data pipeline using Kafka, Spark streaming and Cassandra for data

possible cause: same TeraGen job sometimes slow and sometimes fast

2017-10-18 Thread Gil Vernik
I performed a series of TeraGen jobs via spark-submit ( each job generated equal size dataset into different S3 buckets ) I noticed that some jobs were fast and some were slow. Slow jobs always had many log prints like DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1.0, runningTasks: 1 (

Re: Need help with String Concat Operation

2017-10-18 Thread 高佳翔
Hi Debu, First, Instead of using ‘+’, you can use ‘concat’ to concatenate string columns. And you should enclose “0” with "lit()" to make it a column. Second, 1440 become null because you didn’t tell spark what to do if the when clause is failed. So it simply set the value to null. To fix this,

Need help with String Concat Operation

2017-10-18 Thread Debabrata Ghosh
Hi, I am having a dataframe column (name of the column is CTOFF) and I intend to prefix with '0' in case the length of the column is 3. Unfortunately, I am unable to acheive my goal and wonder whether you can help me here. Command which I am executing: ctoff_dedup_prep_temp =

Re: parition by multiple columns/keys

2017-10-18 Thread Imran Rajjad
yes..I think I figured out something like below Serialized Java Class - public class MyMapPartition implements Serializable,MapPartitionsFunction{ @Override public Iterator call(Iterator iter) throws Exception { ArrayList list = new ArrayList(); // ArrayNode array =