Hi,
What is the way to stop a Spark Streaming job if there is no data inflow
for an arbitrary amount of time (eg: 2 mins)?
Thanks,
Aakash.
Hi Shmuel,
Did you compile the code against the right branch for Spark 1.6.
I tested it and it looks working and now i'm testing the branch for a wide
tests, Please use the branch for Spark 1.6
On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz
wrote:
> Hi Rohit,
>
>
Hi:
I am working on a realtime application using spark structured streaming (v
2.2.1). The application reads data from kafka and if there is a failure, I
would like to ignore the checkpoint. Is there any configuration to just read
from last kafka offset after a failure and ignore any offset
Structured Streaming AUTOMATICALLY saves the offsets in a checkpoint
directory that you provide. And when you start the query again with the
same directory it will just pick up where it left off.
Hi:
I am working with Spark (2.2.1) and Kafka (0.10) on AWS EMR and for the last
few days, after running the application for 30-60 minutes get exception from
Kafka Consumer included below.
The structured streaming application is processing 1 minute worth of data from
kafka topic. So I've tried
Yes indeed, we dont directly support schema migration of state as of now.
However, depending on what stateful operator you are using, you can work
around it. For example, if you are using mapGroupsWithState /
flatMapGroupsWithState, you can save explicitly convert your state to
avro-encoded bytes
I am trying to research a custom Aggregator implementation, and following the
example in the Spark sample code here:
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala
But I cannot use it in the agg function, and
Hi Cody,
I am following to implement the exactly once semantics and also utilize
storing the offsets in database. Question I have is how to use hive instead
of traditional datastores. write to hive will be successful even though
there is any issue with saving offsets into DB. Could you please
Hi Rohit,
Thanks for sharing this great tool.
I tried running a spark job with the tool, but it failed with an
*IncompatibleClassChangeError
*Exception.
I have opened an issue on Github.(
https://github.com/qubole/sparklens/issues/1)
Shmuel
On Thu, Mar 22, 2018 at 5:05 PM, Shmuel Blitz
Thanks.
We will give this a try and report back.
Shmuel
On Thu, Mar 22, 2018 at 4:22 PM, Rohit Karlupia wrote:
> Thanks everyone!
> Please share how it works and how it doesn't. Both help.
>
> Fawaze, just made few changes to make this work with spark 1.6. Can you
> please
Thanks all!
On Thu, Mar 22, 2018 at 2:08 AM, Jorge Machado wrote:
> DataFrames are not mutable.
>
> Jorge Machado
>
>
> On 22 Mar 2018, at 10:07, Aakash Basu wrote:
>
> Hey,
>
> I faced the same issue a couple of days back, kindly go through the mail
Thanks everyone!
Please share how it works and how it doesn't. Both help.
Fawaze, just made few changes to make this work with spark 1.6. Can you
please try building from branch *spark_1.6*
thanks,
rohitk
On Thu, Mar 22, 2018 at 10:18 AM, Fawze Abujaber wrote:
> It's
Spark context runs in driver whereas the func inside foreach runs in
executor. You can pass on the param in the func so it is available in
executor
On Thu, 22 Mar 2018 at 8:18 pm, Kamalanathan Venkatesan <
kamalanatha...@in.ey.com> wrote:
> Hello All,
>
>
>
> I have custom parameter say for
Hello All,
I have custom parameter say for example file name added to the conf of spark
context example SparkConf.set(INPUT_FILE_NAME, fileName).
I need this value inside foreach performed on an RDD, but the when access
spark context inside foreach, I receive spark context is null exception!
DataFrames are not mutable.
Jorge Machado
> On 22 Mar 2018, at 10:07, Aakash Basu wrote:
>
> Hey,
>
> I faced the same issue a couple of days back, kindly go through the mail
> chain with "Multiple Kafka Spark Streaming Dataframe Join query" as subject,
> TD
Hey,
I faced the same issue a couple of days back, kindly go through the mail
chain with "*Multiple Kafka Spark Streaming Dataframe Join query*" as
subject, TD and Chris has cleared my doubts, it would help you too.
Thanks,
Aakash.
On Thu, Mar 22, 2018 at 7:50 AM, kant kodali
Hey Jorge,
Thanks for responding.
Can you elaborate on the user permission part ? HDFS or local ?
As of now, hdfs path ->
hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
already has complete access for yarn
Hi All,As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for ingesting data into druid taking reference from https://github.com/metamx/druid-spark-batchBut we are stuck at the following error.Application Log:—>2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0]
Seems to me permissions problems ! Can you check your user / folder
permissions ?
Jorge Machado
> On 22 Mar 2018, at 08:21, nayan sharma wrote:
>
> Hi All,
> As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for
> ingesting data into
19 matches
Mail list logo