from:"Sandip Mehta"

Re: how to change temp directory when spark write data ?

2018-12-05 Thread Sandip Mehta

tryspark.local.dir property.


On Wed, Dec 5, 2018 at 1:42 PM JF Chen  wrote:

> I have two spark apps writing data to one directory. I notice they share
> one temp directory, and the spark fist finish writing will clear the temp
> directory and the slower one may throw "No lease on *** File does not
> exist" error
> So how to specify the temp directory?
> Thanks!
>
> Regard,
> Junfeng Chen
>

Re: [Structured Streaming] Reuse computation result

2018-02-01 Thread Sandip Mehta

You can use persist() or cache() operation on DataFrame.

On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng  wrote:

> Hi all,
>
> I have a scenario like this:
>
> val df = dataframe.map().filter()
> // agg 1
> val query1 = df.sum.writeStream.start
> // agg 2
> val query2 = df.count.writeStream.start
>
> With spark streaming, we can apply persist() on rdd to reuse the df
> computation result, when we call persist() after filter() map().filter()
> operator only run once.
> With SS, we can’t apply persist() direct on dataframe. query1 and query2
> will not reuse result after filter. map/filter run twice. So is there a way
> to solve this.
>
> Regards,
>
> Shu li Zheng
>
>

Stateful Aggregation Using flatMapGroupsWithState

2017-12-16 Thread Sandip Mehta

Hi All,

I am getting following error message while applying
*flatMapGroupsWithState.*

*Exception in thread "main" org.apache.spark.sql.AnalysisException:
flatMapGroupsWithState in update mode is not supported with aggregation on
a streaming DataFrame/Dataset;;*

Following is what I am trying to do.

- Read messages from Kafka & Parse It
- Group based on certain dimensions
- Ran UDAF for every group and calculated aggregation for each group. Agg
doesn't return KeyValueGroupDataSet. So applying groupByKey on previous
step output to group based on aggFunction column
- Merge this aggregates to previous state of stream using
*flatMapGroupsWithState.*

*Getting error message for the last step. *

*Does this error means I cannot apply **flatMapGroupsWithState after
applying agg() on dataset?*

*Regards*
*Sandeep*

Re: Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta

Hi,

I want to group on certain columns and then for every group wants to apply
custom UDF function to it. Currently groupBy only allows to add aggregation
function to GroupData.

For this was thinking to use groupByKey which will return KeyValueDataSet
and then apply UDF for every group but really not been able solve this.

SM

On Fri, Dec 8, 2017 at 10:29 AM Weichen Xu <weichen...@databricks.com>
wrote:

> You can groupBy multiple columns on dataframe, so why you need so
> complicated schema ?
>
> suppose df schema: (x, y, u, v, z)
>
> df.groupBy($"x", $"y").agg(...)
>
> Is this you want ?
>
> On Fri, Dec 8, 2017 at 11:51 AM, Sandip Mehta <sandip.mehta@gmail.com>
> wrote:
>
>> Hi,
>>
>> During my aggregation I end up having following schema.
>>
>> Row(Row(val1,val2), Row(val1,val2,val3...))
>>
>> val values = Seq(
>> (Row(10, 11), Row(10, 2, 11)),
>> (Row(10, 11), Row(10, 2, 11)),
>> (Row(20, 11), Row(10, 2, 11))
>>   )
>>
>>
>> 1st tuple is used to group the relevant records for aggregation. I have
>> used following to create dataset.
>>
>> val s = StructType(Seq(
>>   StructField("x", IntegerType, true),
>>   StructField("y", IntegerType, true)
>> ))
>> val s1 = StructType(Seq(
>>   StructField("u", IntegerType, true),
>>   StructField("v", IntegerType, true),
>>   StructField("z", IntegerType, true)
>> ))
>>
>> val ds = 
>> sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEncoder(s),
>>  RowEncoder(s1)))
>>
>> Is this correct way of representing this?
>>
>> How do I create dataset and row encoder for such use case for doing
>> groupByKey on this?
>>
>>
>>
>> Regards
>> Sandeep
>>
>
>

Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta

Hi,

During my aggregation I end up having following schema.

Row(Row(val1,val2), Row(val1,val2,val3...))

val values = Seq(
(Row(10, 11), Row(10, 2, 11)),
(Row(10, 11), Row(10, 2, 11)),
(Row(20, 11), Row(10, 2, 11))
  )


1st tuple is used to group the relevant records for aggregation. I have
used following to create dataset.

val s = StructType(Seq(
  StructField("x", IntegerType, true),
  StructField("y", IntegerType, true)
))
val s1 = StructType(Seq(
  StructField("u", IntegerType, true),
  StructField("v", IntegerType, true),
  StructField("z", IntegerType, true)
))

val ds = 
sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEncoder(s),
RowEncoder(s1)))

Is this correct way of representing this?

How do I create dataset and row encoder for such use case for doing
groupByKey on this?



Regards
Sandeep

Number Of Jobs In Spark Streaming

2016-03-04 Thread Sandip Mehta

Hi All,

Is it fair to say that, number of jobs in a given spark streaming application 
is equal to number of actions in an application?

Regards
Sandeep
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark SQL - Reading HCatalog Table

2015-12-03 Thread Sandip Mehta

Hi All,

I have a table created in Hive and stored/read using HCatalog. Table is in ORC 
format. I want to read this table in Spark SQL and do the join with RDDs.  How 
can i connect to HCatalog and get data from Spark SQL?

SM



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Calculating Timeseries Aggregation

2015-11-19 Thread Sandip Mehta

Thank you Sanket for the feedback.

Regards
SM
> On 19-Nov-2015, at 1:57 PM, Sanket Patil <sanket.pa...@knowlarity.com> wrote:
> 
> Hey Sandip:
> 
> TD has already outlined the right approach, but let me add a couple of 
> thoughts as I recently worked on a similar project. I had to compute some 
> real-time metrics on streaming data. Also, these metrics had to be aggregated 
> for hour/day/week/month. My data pipeline was Kafka --> Spark Streaming --> 
> Cassandra.
> 
> I had a spark streaming job that did the following: (1) receive a window of 
> raw streaming data and write it to Cassandra, and (2) do only the basic 
> computations that need to be shown on a real-time dashboard, and store the 
> results in Cassandra. (I had to use sliding window as my computation involved 
> joining data that might occur in different time windows.)
> 
> I had a separate set of Spark jobs that pulled the raw data from Cassandra, 
> computed the aggregations and more complex metrics, and wrote it back to the 
> relevant Cassandra tables. These jobs ran periodically every few minutes.
> 
> Regards,
> Sanket
> 
> On Thu, Nov 19, 2015 at 8:09 AM, Sandip Mehta <sandip.mehta@gmail.com 
> <mailto:sandip.mehta@gmail.com>> wrote:
> Thank you TD for your time and help.
> 
> SM
>> On 19-Nov-2015, at 6:58 AM, Tathagata Das <t...@databricks.com 
>> <mailto:t...@databricks.com>> wrote:
>> 
>> There are different ways to do the rollups. Either update rollups from the 
>> streaming application, or you can generate roll ups in a later process - say 
>> periodic Spark job every hour. Or you could just generate rollups on demand, 
>> when it is queried.
>> The whole thing depends on your downstream requirements - if you always to 
>> have up to date rollups to show up in dashboard (even day-level stuff), then 
>> the first approach is better. Otherwise, second and third approaches are 
>> more efficient.
>> 
>> TD
>> 
>> 
>> On Wed, Nov 18, 2015 at 7:15 AM, Sandip Mehta <sandip.mehta@gmail.com 
>> <mailto:sandip.mehta@gmail.com>> wrote:
>> TD thank you for your reply.
>> 
>> I agree on data store requirement. I am using HBase as an underlying store.
>> 
>> So for every batch interval of say 10 seconds
>> 
>> - Calculate the time dimension ( minutes, hours, day, week, month and 
>> quarter ) along with other dimensions and metrics
>> - Update relevant base table at each batch interval for relevant metrics for 
>> a given set of dimensions.
>> 
>> Only caveat I see is I’ll have to update each of the different roll up table 
>> for each batch window.
>> 
>> Is this a valid approach for calculating time series aggregation?
>> 
>> Regards
>> SM
>> 
>> For minutes level aggregates I have set up a streaming window say 10 seconds 
>> and storing minutes level aggregates across multiple dimension in HBase at 
>> every window interval. 
>> 
>>> On 18-Nov-2015, at 7:45 AM, Tathagata Das <t...@databricks.com 
>>> <mailto:t...@databricks.com>> wrote:
>>> 
>>> For this sort of long term aggregations you should use a dedicated data 
>>> storage systems. Like a database, or a key-value store. Spark Streaming 
>>> would just aggregate and push the necessary data to the data store. 
>>> 
>>> TD
>>> 
>>> On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <sandip.mehta@gmail.com 
>>> <mailto:sandip.mehta@gmail.com>> wrote:
>>> Hi,
>>> 
>>> I am working on requirement of calculating real time metrics and building 
>>> prototype  on Spark streaming. I need to build aggregate at Seconds, 
>>> Minutes, Hours and Day level.
>>> 
>>> I am not sure whether I should calculate all these aggregates as  different 
>>> Windowed function on input DStream or shall I use updateStateByKey function 
>>> for the same. If I have to use updateStateByKey for these time series 
>>> aggregation, how can I remove keys from the state after different time 
>>> lapsed?
>>> 
>>> Please suggest.
>>> 
>>> Regards
>>> SM
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>>> <mailto:user-unsubscr...@spark.apache.org>
>>> For additional commands, e-mail: user-h...@spark.apache.org 
>>> <mailto:user-h...@spark.apache.org>
>>> 
>>> 
>> 
>> 
> 
> 
> 
> 
> -- 
> SuperReceptionist is now available on Android mobiles. Track your business on 
> the go with call analytics, recordings, insights and more: Download the app 
> here <https://play.google.com/store/apps/details?id=com.superreceptionist>
> 
> SuperReceptionist is now available on Android mobiles. Track your business on 
> the go with call analytics, recordings, insights and more: Download the app 
> here <https://play.google.com/store/apps/details?id=com.superreceptionist>

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta

TD thank you for your reply.

I agree on data store requirement. I am using HBase as an underlying store.

So for every batch interval of say 10 seconds

- Calculate the time dimension ( minutes, hours, day, week, month and quarter ) 
along with other dimensions and metrics
- Update relevant base table at each batch interval for relevant metrics for a 
given set of dimensions.

Only caveat I see is I’ll have to update each of the different roll up table 
for each batch window.

Is this a valid approach for calculating time series aggregation?

Regards
SM

For minutes level aggregates I have set up a streaming window say 10 seconds 
and storing minutes level aggregates across multiple dimension in HBase at 
every window interval. 

> On 18-Nov-2015, at 7:45 AM, Tathagata Das <t...@databricks.com> wrote:
> 
> For this sort of long term aggregations you should use a dedicated data 
> storage systems. Like a database, or a key-value store. Spark Streaming would 
> just aggregate and push the necessary data to the data store. 
> 
> TD
> 
> On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <sandip.mehta@gmail.com 
> <mailto:sandip.mehta@gmail.com>> wrote:
> Hi,
> 
> I am working on requirement of calculating real time metrics and building 
> prototype  on Spark streaming. I need to build aggregate at Seconds, Minutes, 
> Hours and Day level.
> 
> I am not sure whether I should calculate all these aggregates as  different 
> Windowed function on input DStream or shall I use updateStateByKey function 
> for the same. If I have to use updateStateByKey for these time series 
> aggregation, how can I remove keys from the state after different time lapsed?
> 
> Please suggest.
> 
> Regards
> SM
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
>

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta

Thank you TD for your time and help.

SM
> On 19-Nov-2015, at 6:58 AM, Tathagata Das <t...@databricks.com> wrote:
> 
> There are different ways to do the rollups. Either update rollups from the 
> streaming application, or you can generate roll ups in a later process - say 
> periodic Spark job every hour. Or you could just generate rollups on demand, 
> when it is queried.
> The whole thing depends on your downstream requirements - if you always to 
> have up to date rollups to show up in dashboard (even day-level stuff), then 
> the first approach is better. Otherwise, second and third approaches are more 
> efficient.
> 
> TD
> 
> 
> On Wed, Nov 18, 2015 at 7:15 AM, Sandip Mehta <sandip.mehta@gmail.com 
> <mailto:sandip.mehta@gmail.com>> wrote:
> TD thank you for your reply.
> 
> I agree on data store requirement. I am using HBase as an underlying store.
> 
> So for every batch interval of say 10 seconds
> 
> - Calculate the time dimension ( minutes, hours, day, week, month and quarter 
> ) along with other dimensions and metrics
> - Update relevant base table at each batch interval for relevant metrics for 
> a given set of dimensions.
> 
> Only caveat I see is I’ll have to update each of the different roll up table 
> for each batch window.
> 
> Is this a valid approach for calculating time series aggregation?
> 
> Regards
> SM
> 
> For minutes level aggregates I have set up a streaming window say 10 seconds 
> and storing minutes level aggregates across multiple dimension in HBase at 
> every window interval. 
> 
>> On 18-Nov-2015, at 7:45 AM, Tathagata Das <t...@databricks.com 
>> <mailto:t...@databricks.com>> wrote:
>> 
>> For this sort of long term aggregations you should use a dedicated data 
>> storage systems. Like a database, or a key-value store. Spark Streaming 
>> would just aggregate and push the necessary data to the data store. 
>> 
>> TD
>> 
>> On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <sandip.mehta@gmail.com 
>> <mailto:sandip.mehta@gmail.com>> wrote:
>> Hi,
>> 
>> I am working on requirement of calculating real time metrics and building 
>> prototype  on Spark streaming. I need to build aggregate at Seconds, 
>> Minutes, Hours and Day level.
>> 
>> I am not sure whether I should calculate all these aggregates as  different 
>> Windowed function on input DStream or shall I use updateStateByKey function 
>> for the same. If I have to use updateStateByKey for these time series 
>> aggregation, how can I remove keys from the state after different time 
>> lapsed?
>> 
>> Please suggest.
>> 
>> Regards
>> SM
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> <mailto:user-h...@spark.apache.org>
>> 
>> 
> 
>

Calculating Timeseries Aggregation

2015-11-14 Thread Sandip Mehta

Hi,

I am working on requirement of calculating real time metrics and building 
prototype  on Spark streaming. I need to build aggregate at Seconds, Minutes, 
Hours and Day level. 

I am not sure whether I should calculate all these aggregates as  different 
Windowed function on input DStream or shall I use updateStateByKey function for 
the same. If I have to use updateStateByKey for these time series aggregation, 
how can I remove keys from the state after different time lapsed?

Please suggest.

Regards
SM
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Checkpointing an InputDStream from Kafka

2015-11-07 Thread Sandip Mehta

I believe you’ll have to use another way of creating StreamingContext by 
passing create function in getOrCreate function.

private def setupSparkContext(): StreamingContext = {
  val streamingSparkContext = {
val sparkConf = new 
SparkConf().setAppName(config.appName).setMaster(config.master)
new StreamingContext(sparkConf, config.batchInterval)
  }
  streamingSparkContext.checkpoint(config.checkpointDir)
  streamingSparkContext
}

….
val ssc = StreamingContext.getOrCreate(config.checkpointDir, setupSparkContext)

Javadoc for getOrCreate

/**
 * Either recreate a StreamingContext from checkpoint data or create a new 
StreamingContext.
 * If checkpoint data exists in the provided `checkpointPath`, then 
StreamingContext will be
 * recreated from the checkpoint data. If the data does not exist, then the 
StreamingContext
 * will be created by called the provided `creatingFunc`.
 *
 * @param checkpointPath Checkpoint directory used in an earlier 
StreamingContext program
 * @param creatingFunc   Function to create a new StreamingContext
 * @param hadoopConf Optional Hadoop configuration if necessary for reading 
from the
 *   file system
 * @param createOnError  Optional, whether to create a new StreamingContext if 
there is an
 *   error in reading checkpoint data. By default, an 
exception will be
 *   thrown on error.
 */

Hope this helps!

SM



> On 06-Nov-2015, at 8:19 PM, Cody Koeninger  wrote:
> 
> Have you looked at the driver and executor logs?
> 
> Without being able to see what's in the "do stuff with the dstream" section 
> of code... I'd suggest starting with a simpler job, e.g that does nothing but 
> print each message, and verify whether it checkpoints
> 
> On Fri, Nov 6, 2015 at 3:59 AM, Kathi Stutz  > wrote:
> Hi all,
> 
> I want to load an InputDStream from a checkkpoint, but I doesn't work, and
> after trying several things I have finally run out of ideas.
> 
> So, here's what I do:
> 
> 1. I create the streaming context - or load it from the checkpoint directory.
> 
>   def main(args: Array[String]) {
> val ssc = StreamingContext.getOrCreate("files/checkpoint",
> createStreamingContext _)
> ssc.start()
> ssc.awaitTermination()
>   }
> 
> 2. In the function createStreamingContext(), I first create a new Spark
> config...
> 
>   def createStreamingContext(): StreamingContext = {
> println("New Context")
> 
> val conf = new SparkConf()
>   .setMaster("local[2]")
>   .setAppName("CheckpointTest")
>   .set("spark.streaming.kafka.maxRatePerPartition", "1")
> 
> //...then I create the streaming context...
> val ssc = new StreamingContext(conf, Seconds(1))
> 
> var offsetRanges = Array[OffsetRange]()
> val kafkaParams = Map("metadata.broker.list" ->
> "sandbox.hortonworks.com:6667 ",
>   "auto.offset.reset" -> "smallest") //Start from beginning
> val kafkaTopics = Set("Bla")
> 
> //...then I go and get a DStream from Kafka...
> val directKafkaStream = KafkaUtils.createDirectStream[String,
> Array[Byte], StringDecoder, DefaultDecoder](ssc,
> kafkaParams, kafkaTopics)
> 
> //...I do stuff with the DStream
> ...
> 
> //...and finally I checkpoint the streaming context and return it
> ssc.checkpoint("files/checkpoint")
> ssc
> }
> 
> 3. When I start the application, after a while it creates in
> files/checkpoint/ an empty directory with a name like
> 23207ed2-c021-4a1d-8af8-0620a19a8665. But that's all, no more files or
> directories or whatever appear there.
> 
> 4. When I stop the application and restart it, it creates a new streaming
> context each time. (This also means it starts the Kafka streaming from the
> smallest available offset again and again. The main reason for using
> checkpoints for me was to not having to keep track of Kafka offsets.)
> 
> So, what am I doing wrong?
> 
> Thanks a lot!
> 
> Kathi
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
>

Re: [Spark Streaming] Design Patterns forEachRDD

2015-10-21 Thread Sandip Mehta

Does this help ?

final JavaHBaseContext hbaseContext = new JavaHBaseContext(javaSparkContext, 
conf);
customerModels.foreachRDD(new Function() {
  private static final long serialVersionUID = 1L;
  @Override
  public Void call(JavaRDD currentRDD) throws Exception {
JavaRDD customerWithPromotion = 
hbaseContext.mapPartition(currentRDD, new PromotionLookupFunction());
customerWithPromotion.persist(StorageLevel.MEMORY_AND_DISK_SER());
customerWithPromotion.foreachPartition();
  }
});


> On 21-Oct-2015, at 10:55 AM, Nipun Arora  wrote:
> 
> Hi All,
> 
> Can anyone provide a design pattern for the following code shown in the Spark 
> User Manual, in JAVA ? I have the same exact use-case, and for some reason 
> the design pattern for Java is missing.
> 
>  Scala version taken from : 
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>  
> 
> 
> dstream.foreachRDD { rdd =>
>   rdd.foreachPartition { partitionOfRecords =>
> val connection = createNewConnection()
> partitionOfRecords.foreach(record => connection.send(record))
> connection.close()
>   }
> }
> 
> I have googled for it and haven't really found a solution. This seems to be 
> an important piece of information, especially for people who need to ship 
> their code necessarily in Java because of constraints in the company (like 
> me) :)
> 
> I'd really appreciate any help
> 
> Thanks
> Nipun

Re: Call Site - Spark Context

2015-10-01 Thread Sandip Mehta

Thanks Robin. 

Regards
SM
> On 01-Oct-2015, at 3:15 pm, Robin East <robin.e...@xense.co.uk> wrote:
> 
> From the comments in the code:
> 
> When called inside a class in the spark package, returns the name of the user 
> code class (outside the spark package) that called into Spark, as well as 
> which Spark method they called. This is used, for example, to tell users 
> where in their code each RDD got created.
> Keep crawling up the stack trace until we find the first function not inside 
> of the spark package. We track the last (shallowest) contiguous Spark method. 
> This might be an RDD transformation, a SparkContext function (such as 
> parallelize), or anything else that leads to instantiation of an RDD. We also 
> track the first (deepest) user method, file, and line.
> So basically it’s a mechanism to report where in the user’s code an RDD is 
> created.
> ---
> Robin East
> Spark GraphX in Action Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action 
> <http://www.manning.com/books/spark-graphx-in-action>
> 
> 
> 
> 
> 
>> On 1 Oct 2015, at 23:06, Sandip Mehta <sandip.mehta@gmail.com 
>> <mailto:sandip.mehta@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I wanted to understand what is the purpose of Call Site in Spark Context?
>> 
>> Regards
>> SM
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> <mailto:user-h...@spark.apache.org>
>> 
>

Re: Call Site - Spark Context

2015-10-01 Thread Sandip Mehta

Thanks Robin. 

Regards
SM
> On 01-Oct-2015, at 3:15 pm, Robin East <robin.e...@xense.co.uk> wrote:
> 
> From the comments in the code:
> 
> When called inside a class in the spark package, returns the name of the user 
> code class (outside the spark package) that called into Spark, as well as 
> which Spark method they called. This is used, for example, to tell users 
> where in their code each RDD got created.
> Keep crawling up the stack trace until we find the first function not inside 
> of the spark package. We track the last (shallowest) contiguous Spark method. 
> This might be an RDD transformation, a SparkContext function (such as 
> parallelize), or anything else that leads to instantiation of an RDD. We also 
> track the first (deepest) user method, file, and line.
> So basically it’s a mechanism to report where in the user’s code an RDD is 
> created.
> ---
> Robin East
> Spark GraphX in Action Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action 
> <http://www.manning.com/books/spark-graphx-in-action>
> 
> 
> 
> 
> 
>> On 1 Oct 2015, at 23:06, Sandip Mehta <sandip.mehta@gmail.com 
>> <mailto:sandip.mehta@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I wanted to understand what is the purpose of Call Site in Spark Context?
>> 
>> Regards
>> SM
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> <mailto:user-h...@spark.apache.org>
>> 
>

Call Site - Spark Context

2015-10-01 Thread Sandip Mehta

Hi,

I wanted to understand what is the purpose of Call Site in Spark Context?

Regards
SM

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: how to change temp directory when spark write data ?

Re: [Structured Streaming] Reuse computation result

Stateful Aggregation Using flatMapGroupsWithState

Re: Row Encoder For DataSet

Row Encoder For DataSet

Number Of Jobs In Spark Streaming

Spark SQL - Reading HCatalog Table

Re: Calculating Timeseries Aggregation

Re: Calculating Timeseries Aggregation

Re: Calculating Timeseries Aggregation

Calculating Timeseries Aggregation

Re: Checkpointing an InputDStream from Kafka

Re: [Spark Streaming] Design Patterns forEachRDD

Re: Call Site - Spark Context

Re: Call Site - Spark Context

Call Site - Spark Context

16 matches

Site Navigation

Mail list logo

Footer information