ForEachBatch collecting batch to driver

2020-03-10 Thread Ruijing Li
Hi all,

I’m curious on how foreachbatch works in spark structured streaming. So
since it is taking in a micro batch dataframe, that means the code in
foreachbatch is executing on spark driver? Does this mean for large
batches, you could potentially have OOM issues from collecting each
partition into the driver?
-- 
Cheers,
Ruijing Li


Spark Submit through yarn is failing with Default queue.

2020-03-10 Thread SB M
Hi All,

Am trying to submit my application using spark-submit in yarn mode.

But its failing because of unknown queue default, we specified the queue
name in spark-default.conf as spark.yarn.queue SecondaryQueue

its failing for one application, but for another application dont know the
reason.

plesee help me with this.

Regards,
SBM


Re: Read Hive ACID Managed table in Spark

2020-03-10 Thread Chetan Khatri
Hi Venkata,
Thanks for your reply. I am using HDP 2.6 and I don't think above will work
for me, Any other suggestions? Thanks

On Thu, Mar 5, 2020 at 8:24 AM venkata naidu udamala <
vudamala.gyan...@gmail.com> wrote:

> You can try using have warehouse connector
> https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
>
> On Thu, Mar 5, 2020, 6:51 AM Chetan Khatri 
> wrote:
>
>> Just followup, if anyone has worried on this before
>>
>> On Wed, Mar 4, 2020 at 12:09 PM Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hi Spark Users,
>>> I want to read Hive ACID managed table data (ORC) in Spark. Can someone
>>> help me here.
>>> I've tried, https://github.com/qubole/spark-acid but no success.
>>>
>>> Thanks
>>>
>>