I have registered a udf with sqlcontext , I am trying to read another
parquet using sqlcontext under same udf it’s throwing null pointer
exception .
Any help how to access sqlcontext inside a udf ?
Regards,
Sk
As you may be aware the granularity that Spark streaming has is
micro-batching and that is limited to 0.5 second. So if you have continuous
ingestion of data then Spark streaming may not be granular enough for CEP.
You may consider other products.
Worth looking at this old thread on mine "Spark
Hello all,
Has anyone used spark streaming for CEP (Complex Event processing). Any
CEP libraries that works well with spark. I have a use case for CEP and
trying to see if spark streaming is a good fit.
Currently we have a data pipeline using Kafka, Spark streaming and
Cassandra for data
I performed a series of TeraGen jobs via spark-submit ( each job generated
equal size dataset into different S3 buckets )
I noticed that some jobs were fast and some were slow.
Slow jobs always had many log prints like
DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1.0, runningTasks: 1
(
Hi Debu,
First, Instead of using ‘+’, you can use ‘concat’ to concatenate string
columns. And you should enclose “0” with "lit()" to make it a column.
Second, 1440 become null because you didn’t tell spark what to do if the
when clause is failed. So it simply set the value to null. To fix this,
Hi,
I am having a dataframe column (name of the column is CTOFF)
and I intend to prefix with '0' in case the length of the column is 3.
Unfortunately, I am unable to acheive my goal and wonder whether you can
help me here.
Command which I am executing:
ctoff_dedup_prep_temp =
yes..I think I figured out something like below
Serialized Java Class
-
public class MyMapPartition implements Serializable,MapPartitionsFunction{
@Override
public Iterator call(Iterator iter) throws Exception {
ArrayList list = new ArrayList();
// ArrayNode array =