Re: Migrating from hive to spark

2021-06-17 Thread Mich Talebzadeh
Ok the first link throws some clues

.*... Hive excels in batch disc processing with a map reduce execution
engine. Actually, Hive can also use Spark as its execution engine which
also has a Hive context allowing us to query Hive tables. Despite all the
great things Hive can solve, this post is to talk about why we move our
ETL’s to the ‘not so new’ player for batch processing, ...*

Great, you want to use Spark for ETL as opposed to Hive for cleaning up
your data once your upstream CDC files are landed on HDF? correct





   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 17 Jun 2021 at 08:17, Battula, Brahma Reddy 
wrote:

> Hi Talebzadeh,
>
>
>
> Looks I confused, Sorry.. Now I changed to subject to make it clear.
>
> Facebook has tried migration from hive to spark. Check the following links
> for same.
>
>
>
> *https://www.dcsl.com/migrating-from-hive-to-spark/
> <https://www.dcsl.com/migrating-from-hive-to-spark/>*
>
>
> https://databricks.com/session/experiences-migrating-hive-workload-to-sparksql
>
> https://www.cloudwalker.io/2019/02/19/spark-ad-hoc-querying/
>
>
>
>
>
> would like to know, like this anybody else migrated..? and any challenges
> or pre-requisite to migrate(Like hardware)..? any tools to evaluate before
> we migrate?
>
>
>
>
>
>
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, 15 June 2021 at 10:36 PM
> *To: *Battula, Brahma Reddy 
> *Cc: *Battula, Brahma Reddy , ayan guha <
> guha.a...@gmail.com>, d...@spark.apache.org ,
> user@spark.apache.org 
> *Subject: *Re: Spark-sql can replace Hive ?
>
> OK you mean use spark.sql as opposed to HiveContext.sql?
>
>
>
> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>
> HiveContext.sql("")
>
>
>
> replace with
>
>
>
> spark.sql("")
>
> ?
>
>
>
>
>view my Linkedin profile
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7Cbbattula%40visa.com%7C3bb528ad53c8445e7dde08d9301fdf30%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637593735708866891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zJHaQrxmha3ZZxsUntvBjjwhbcFsfr92Hy1B5a%2FoFmw%3D&reserved=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Tue, 15 Jun 2021 at 18:00, Battula, Brahma Reddy 
> wrote:
>
> Currently I am using hive sql engine for adhoc queries. As spark-sql also
> supports this, I want migrate from hive.
>
>
>
>
>
>
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Thursday, 10 June 2021 at 8:12 PM
> *To: *Battula, Brahma Reddy 
> *Cc: *ayan guha , d...@spark.apache.org <
> d...@spark.apache.org>, user@spark.apache.org 
> *Subject: *Re: Spark-sql can replace Hive ?
>
> These are different things. Spark provides a computational layer and a
> dialogue of SQL based on Hive.
>
>
>
> Hive is a DW on top of HDFS. What are you trying to replace?
>
>
>
> HTH
>
>
>
>
>
>
>view my Linkedin profile
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7Cbbattula%40visa.com%7C3bb528ad53c8445e7dde08d9301fdf30%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637593735708876847%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lo9URWG2yavrcQbWpp7VjHcb16wLtE9DW%2FBX%2BhYjYtE%3D&reserved=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Thu, 10 Jun 2021 at 12:09, Battula, Brahma Reddy
>  wrote:
&

Migrating from hive to spark

2021-06-17 Thread Battula, Brahma Reddy
Hi Talebzadeh,

Looks I confused, Sorry.. Now I changed to subject to make it clear.
Facebook has tried migration from hive to spark. Check the following links for 
same.

https://www.dcsl.com/migrating-from-hive-to-spark/
https://databricks.com/session/experiences-migrating-hive-workload-to-sparksql
https://www.cloudwalker.io/2019/02/19/spark-ad-hoc-querying/


would like to know, like this anybody else migrated..? and any challenges or 
pre-requisite to migrate(Like hardware)..? any tools to evaluate before we 
migrate?




From: Mich Talebzadeh 
Date: Tuesday, 15 June 2021 at 10:36 PM
To: Battula, Brahma Reddy 
Cc: Battula, Brahma Reddy , ayan guha 
, d...@spark.apache.org , 
user@spark.apache.org 
Subject: Re: Spark-sql can replace Hive ?
OK you mean use spark.sql as opposed to HiveContext.sql?

val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
HiveContext.sql("")

replace with

spark.sql("")
?



 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7Cbbattula%40visa.com%7C3bb528ad53c8445e7dde08d9301fdf30%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637593735708866891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zJHaQrxmha3ZZxsUntvBjjwhbcFsfr92Hy1B5a%2FoFmw%3D&reserved=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Tue, 15 Jun 2021 at 18:00, Battula, Brahma Reddy 
mailto:bbatt...@visa.com>> wrote:
Currently I am using hive sql engine for adhoc queries. As spark-sql also 
supports this, I want migrate from hive.




From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Date: Thursday, 10 June 2021 at 8:12 PM
To: Battula, Brahma Reddy 
Cc: ayan guha mailto:guha.a...@gmail.com>>, 
d...@spark.apache.org<mailto:d...@spark.apache.org> 
mailto:d...@spark.apache.org>>, 
user@spark.apache.org<mailto:user@spark.apache.org> 
mailto:user@spark.apache.org>>
Subject: Re: Spark-sql can replace Hive ?
These are different things. Spark provides a computational layer and a dialogue 
of SQL based on Hive.

Hive is a DW on top of HDFS. What are you trying to replace?

HTH





 
[https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7Cbbattula%40visa.com%7C3bb528ad53c8445e7dde08d9301fdf30%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637593735708876847%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lo9URWG2yavrcQbWpp7VjHcb16wLtE9DW%2FBX%2BhYjYtE%3D&reserved=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Thu, 10 Jun 2021 at 12:09, Battula, Brahma Reddy  
wrote:
Thanks for prompt reply.

I want to replace hive with spark.




From: ayan guha mailto:guha.a...@gmail.com>>
Date: Thursday, 10 June 2021 at 4:35 PM
To: Battula, Brahma Reddy 
Cc: d...@spark.apache.org<mailto:d...@spark.apache.org> 
mailto:d...@spark.apache.org>>, 
user@spark.apache.org<mailto:user@spark.apache.org> 
mailto:user@spark.apache.org>>
Subject: Re: Spark-sql can replace Hive ?
Would you mind expanding the ask? Spark Sql can use hive by itaelf

On Thu, 10 Jun 2021 at 8:58 pm, Battula, Brahma Reddy 
 wrote:
Hi

Would like know any refences/docs to replace hive with spark-sql completely 
like how migrate the existing data in hive.?

thanks


--
Best Regards,
Ayan Guha