Difference between Typed and untyped transformation in dataset API

2019-02-21 Thread Akhilanand
What is the key difference between Typed and untyped transformation in dataset API? How do I determine if its typed or untyped? Any gotchas when to use what apart from the reason that it does the job for me?

Re: Spark-hive integration on HDInsight

2019-02-21 Thread amit kumar singh
Hey jay How you are making your cluster are you using spark cluster All this thing should be set up automatically Sent from my iPhone > On Feb 21, 2019, at 12:12 PM, Felix Cheung wrote: > > You should check with HDInsight support > > From: Jay Singh > Sent: Wednesday, February 20,

Re: Spark-hive integration on HDInsight

2019-02-21 Thread Felix Cheung
You should check with HDInsight support From: Jay Singh Sent: Wednesday, February 20, 2019 11:43:23 PM To: User Subject: Spark-hive integration on HDInsight I am trying to integrate spark with hive on HDInsight spark cluster . I copied hive-site.xml in

Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-21 Thread Gabor Somogyi
>From the info you've provided not much to say. Maybe you could collect sample app, logs etc, open a jira and we can take a deeper look at it... BR, G On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz wrote: > I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct Stream > as

Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-21 Thread Guillermo Ortiz
I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct Stream as connector. I consume data from Kafka and autosave the offsets. I can see Spark doing commits in the logs of the last offsets processed, Sometimes I have restarted spark and it starts from the beginning, when I'm using the

Structured streaming performance issues

2019-02-21 Thread gvdongen
Hi everyone, I have the following pipeline: Ingest 2 streams from Kafka -> parse JSON -> join both streams -> aggregate on a key over the last second -> output to Kafka with Join: inner join in interval of one second, with watermarking 50 ms Aggregation: tumbling window of one second, with