Re: how to change temp directory when spark write data ?

2018-12-05 Thread Sandip Mehta
tryspark.local.dir property. On Wed, Dec 5, 2018 at 1:42 PM JF Chen wrote: > I have two spark apps writing data to one directory. I notice they share > one temp directory, and the spark fist finish writing will clear the temp > directory and the slower one may throw "No lease on *** File does

Re: [Structured Streaming] Reuse computation result

2018-02-01 Thread Sandip Mehta
You can use persist() or cache() operation on DataFrame. On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng wrote: > Hi all, > > I have a scenario like this: > > val df = dataframe.map().filter() > // agg 1 > val query1 = df.sum.writeStream.start > // agg 2 > val query2 =

Stateful Aggregation Using flatMapGroupsWithState

2017-12-16 Thread Sandip Mehta
Hi All, I am getting following error message while applying *flatMapGroupsWithState.* *Exception in thread "main" org.apache.spark.sql.AnalysisException: flatMapGroupsWithState in update mode is not supported with aggregation on a streaming DataFrame/Dataset;;* Following is what I am trying to

Re: Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
"y").agg(...) > > Is this you want ? > > On Fri, Dec 8, 2017 at 11:51 AM, Sandip Mehta <sandip.mehta@gmail.com> > wrote: > >> Hi, >> >> During my aggregation I end up having following schema. >> >> Row(Row(val1,val2), Row(val1,val2,v

Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
Hi, During my aggregation I end up having following schema. Row(Row(val1,val2), Row(val1,val2,val3...)) val values = Seq( (Row(10, 11), Row(10, 2, 11)), (Row(10, 11), Row(10, 2, 11)), (Row(20, 11), Row(10, 2, 11)) ) 1st tuple is used to group the relevant records for

Number Of Jobs In Spark Streaming

2016-03-04 Thread Sandip Mehta
Hi All, Is it fair to say that, number of jobs in a given spark streaming application is equal to number of actions in an application? Regards Sandeep - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Spark SQL - Reading HCatalog Table

2015-12-03 Thread Sandip Mehta
Hi All, I have a table created in Hive and stored/read using HCatalog. Table is in ORC format. I want to read this table in Spark SQL and do the join with RDDs. How can i connect to HCatalog and get data from Spark SQL? SM

Re: Calculating Timeseries Aggregation

2015-11-19 Thread Sandip Mehta
ows.) > > I had a separate set of Spark jobs that pulled the raw data from Cassandra, > computed the aggregations and more complex metrics, and wrote it back to the > relevant Cassandra tables. These jobs ran periodically every few minutes. > > Regards, > Sanket

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta
dedicated data > storage systems. Like a database, or a key-value store. Spark Streaming would > just aggregate and push the necessary data to the data store. > > TD > > On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <sandip.mehta@gmail.com > <mailto:sandip.mehta@

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta
tter. Otherwise, second and third approaches are more > efficient. > > TD > > > On Wed, Nov 18, 2015 at 7:15 AM, Sandip Mehta <sandip.mehta@gmail.com > <mailto:sandip.mehta@gmail.com>> wrote: > TD thank you for your reply. > > I agree on data st

Calculating Timeseries Aggregation

2015-11-14 Thread Sandip Mehta
Hi, I am working on requirement of calculating real time metrics and building prototype on Spark streaming. I need to build aggregate at Seconds, Minutes, Hours and Day level. I am not sure whether I should calculate all these aggregates as different Windowed function on input DStream or

Re: Checkpointing an InputDStream from Kafka

2015-11-07 Thread Sandip Mehta
I believe you’ll have to use another way of creating StreamingContext by passing create function in getOrCreate function. private def setupSparkContext(): StreamingContext = { val streamingSparkContext = { val sparkConf = new SparkConf().setAppName(config.appName).setMaster(config.master)

Re: [Spark Streaming] Design Patterns forEachRDD

2015-10-21 Thread Sandip Mehta
Does this help ? final JavaHBaseContext hbaseContext = new JavaHBaseContext(javaSparkContext, conf); customerModels.foreachRDD(new Function() { private static final long serialVersionUID = 1L; @Override public Void call(JavaRDD currentRDD) throws Exception {

Re: Call Site - Spark Context

2015-10-01 Thread Sandip Mehta
anning.com/books/spark-graphx-in-action> > > > > > >> On 1 Oct 2015, at 23:06, Sandip Mehta <sandip.mehta@gmail.com >> <mailto:sandip.mehta@gmail.com>> wrote

Re: Call Site - Spark Context

2015-10-01 Thread Sandip Mehta
anning.com/books/spark-graphx-in-action> > > > > > >> On 1 Oct 2015, at 23:06, Sandip Mehta <sandip.mehta@gmail.com >> <mailto:sandip.mehta@gmail.com>> wrote

Call Site - Spark Context

2015-10-01 Thread Sandip Mehta
Hi, I wanted to understand what is the purpose of Call Site in Spark Context? Regards SM - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org