Re: spark as data warehouse?

2022-03-25 Thread Deepak Sharma
It can be used as warehouse but then you have to keep long running spark jobs. This can be possible using cached data frames or dataset . Thanks Deepak On Sat, 26 Mar 2022 at 5:56 AM, wrote: > In the past time we have been using hive for building the data > warehouse. > Do you think if spark

spark as data warehouse?

2022-03-25 Thread capitnfrakass
In the past time we have been using hive for building the data warehouse. Do you think if spark can used for this purpose? it's even more realtime than hive. Thanks. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Question for so many SQL tools

2022-03-25 Thread Bjørn Jørgensen
No they are not doing the same thing. But everyone knows SQL. SQL has been there since 1972. Apache Drill is for NoSQL Spark is for everything you will do with data. All of them have their pros and cons. You just have to find what's best for your task. fre. 25. mar. 2022 kl. 22:32 skrev Bitfox

Re: GraphX Support

2022-03-25 Thread Bjørn Jørgensen
Yes, MLlib is actively developed. You can have a look at github and filter on closed and ML github and filter on closed and ML fre. 25. mar. 2022 kl. 22:15 skrev Bitfox : > BTW , is MLlib

Question for so many SQL tools

2022-03-25 Thread Bitfox
Just a question why there are so many SQL based tools existing for data jobs? The ones I know, Spark Flink Ignite Impala Drill Hive … They are doing the similar jobs IMO. Thanks

Re: GraphX Support

2022-03-25 Thread Bitfox
BTW , is MLlib still in active development? Thanks On Tue, Mar 22, 2022 at 07:11 Sean Owen wrote: > GraphX is not active, though still there and does continue to build and > test with each Spark release. GraphFrames kind of superseded it, but is > also not super active FWIW. > > On Mon, Mar

Re: [EXTERNAL] Re: GraphX Support

2022-03-25 Thread Bjørn Jørgensen
One alternative can be to use Spark and ArangoDB Introducing the new ArangoDB Datasource for Apache Spark ArongoDB is a open source graphs DB with a lot of good graphs utils

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

2022-03-25 Thread Gourav Sengupta
Hi, completely agree with Alex, also if you are just writing to Cassandra then what is the purpose of writing to Kafka broker? Generally people just find it sound as if adding more components to their architecture is great, but sadly it is not. Remove the Kafka broker, incase you are not

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

2022-03-25 Thread Alex Ott
You don't need to use foreachBatch to write to Cassandra. You just need to use Spark Cassandra Connector version 2.5.0 or higher - it supports native writing of stream data into Cassandra. Here is an announcement: https://www.datastax.com/blog/advanced-apache-cassandra-analytics-now-open-all

Re: Cannot compare columns directly in IF...ELSE statement

2022-03-25 Thread Balakrishnan Ayyappan
Not sure if I understood the question correctly, but did you try using `case when` ? Thanks, Bala On Fri, Mar 25, 2022, 12:44 PM Sid wrote: > Hi Team, > > I need help with the below problem: > > > https://stackoverflow.com/questions/71613292/how-to-use-columns-in-if-else-condition-in-pyspark

[ANNOUNCE] Apache Kyuubi (Incubating) released 1.5.0-incubating

2022-03-25 Thread Kent Yao
Hi all, The Apache Kyuubi (Incubating) community is pleased to announce that Apache Kyuubi (Incubating) 1.5.0-incubating has been released! Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark and

Cannot compare columns directly in IF...ELSE statement

2022-03-25 Thread Sid
Hi Team, I need help with the below problem: https://stackoverflow.com/questions/71613292/how-to-use-columns-in-if-else-condition-in-pyspark Thanks, Sid