Re: spark as data warehouse?

2022-03-26 Thread Cheng Pan
Sorry I missed the original channel, added it back. - I have less knowledge about dbt. If it supports Hive, it should support Kyuubi. Basically, Kyuubi is gateway between your client(e.g. beeline, hive jdbc client) and compute engine(e.g. Spark, Flink, Trino), I think the most valuable

Re: spark as data warehouse?

2022-03-25 Thread Deepak Sharma
It can be used as warehouse but then you have to keep long running spark jobs. This can be possible using cached data frames or dataset . Thanks Deepak On Sat, 26 Mar 2022 at 5:56 AM, wrote: > In the past time we have been using hive for building the data > warehouse. > Do you think if spark

spark as data warehouse?

2022-03-25 Thread capitnfrakass
In the past time we have been using hive for building the data warehouse. Do you think if spark can used for this purpose? it's even more realtime than hive. Thanks. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark based Data Warehouse

2017-11-17 Thread lucas.g...@gmail.com
We are using Spark on Kubernetes on AWS (it's a long story) but it does work. It's still on the raw side but we've been pretty successful. We configured our cluster primarily with Kube-AWS and auto scaling groups. There are gotcha's there, but so far we've been quite successful. Gary Lucas On

Re: Spark based Data Warehouse

2017-11-17 Thread ashish rawat
Thanks everyone for their suggestions. Does any of you take care of auto scale up and down of your underlying spark clusters on AWS? On Nov 14, 2017 10:46 AM, "lucas.g...@gmail.com" wrote: Hi Ashish, bear in mind that EMR has some additional tooling available that smoothes

Re: Spark based Data Warehouse

2017-11-13 Thread lucas.g...@gmail.com
Hi Ashish, bear in mind that EMR has some additional tooling available that smoothes out some S3 problems that you may / almost certainly will encounter. We are using Spark / S3 not on EMR and have encountered issues with file consistency, you can deal with it but be aware it's additional

Re: Spark based Data Warehouse

2017-11-13 Thread Affan Syed
Another option that we are trying internally is to uses Mesos for isolating different jobs or groups. Within a single group, using Livy to create different spark contexts also works. - Affan On Tue, Nov 14, 2017 at 8:43 AM, ashish rawat wrote: > Thanks Sky Yin. This really

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
Thanks Sky Yin. This really helps. On Nov 14, 2017 12:11 AM, "Sky Yin" wrote: We are running Spark in AWS EMR as data warehouse. All data are in S3 and metadata in Hive metastore. We have internal tools to creat juypter notebook on the dev cluster. I guess you can use

Re: Spark based Data Warehouse

2017-11-13 Thread Sky Yin
We are running Spark in AWS EMR as data warehouse. All data are in S3 and metadata in Hive metastore. We have internal tools to creat juypter notebook on the dev cluster. I guess you can use zeppelin instead, or Livy? We run genie as a job server for the prod cluster, so users have to submit

Re: Spark based Data Warehouse

2017-11-13 Thread Deepak Sharma
> livy-a-rest-interface-for-apache-spark/ >> >> >> >> Livy Server will allow you to create multiple spark contexts via REST: >> https://livy.incubator.apache.org/ >> >> >> >> If you are looking for broad SQL functionality I’d recommend >> instantiat

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
ou to create multiple spark contexts via REST: > https://livy.incubator.apache.org/ > > > > If you are looking for broad SQL functionality I’d recommend instantiating > a Hive context. And Spark is able to spill to disk à > https://spark.apache.org/faq.html > > > > Th

Re: Spark based Data Warehouse

2017-11-12 Thread Patrick Alwell
ies running spark within their data warehouse solutions: https://ibmdatawarehousing.wordpress.com/2016/10/12/steinbach_dashdb_local_spark/ Edmunds used Spark to allow business analysts to point Spark to files in S3 and infer schema: https://www.youtube.com/watch?v=gsR1ljgZLq0 Recommend running s

Re: Spark based Data Warehouse

2017-11-12 Thread Vadim Semenov
any is using >> the cluster at a given moment in time. >> >> Does this help? >> >> Regards, >> >> Phillip >> >> >> On Sun, Nov 12, 2017 at 5:50 PM, ashish rawat <dceash...@gmail.com> >> wrote: >> >>> Thanks Jorn and

Re: Spark based Data Warehouse

2017-11-12 Thread Gourav Sengupta
but this can happen irrespective of whether one user or many is using > the cluster at a given moment in time. > > Does this help? > > Regards, > > Phillip > > > On Sun, Nov 12, 2017 at 5:50 PM, ashish rawat <dceash...@gmail.com> wrote: > >> Thanks Jorn and Philli

Re: Spark based Data Warehouse

2017-11-12 Thread Phillip Henry
hillip. My question was specifically to anyone who have > tried creating a system using spark SQL, as Data Warehouse. I was trying to > check, if someone has tried it and they can help with the kind of workloads > which worked and the ones, which have problems. > > Regarding spill to

Re: Spark based Data Warehouse

2017-11-12 Thread ashish rawat
Thanks Jorn and Phillip. My question was specifically to anyone who have tried creating a system using spark SQL, as Data Warehouse. I was trying to check, if someone has tried it and they can help with the kind of workloads which worked and the ones, which have problems. Regarding spill to disk

Re: Spark based Data Warehouse

2017-11-12 Thread Phillip Henry
Agree with Jorn. The answer is: it depends. In the past, I've worked with data scientists who are happy to use the Spark CLI. Again, the answer is "it depends" (in this case, on the skills of your customers). Regarding sharing resources, different teams were limited to their own queue so they

Re: Spark based Data Warehouse

2017-11-12 Thread Jörn Franke
What do you mean all possible workloads? You cannot prepare any system to do all possible processing. We do not know the requirements of your data scientists now or in the future so it is difficult to say. How do they work currently without the new solution? Do they all work on the same data? I

Re: Spark based Data Warehouse

2017-11-11 Thread Deepak Sharma
I am looking for similar solution more aligned to data scientist group. The concern i have is about supporting complex aggregations at runtime . Thanks Deepak On Nov 12, 2017 12:51, "ashish rawat" wrote: > Hello Everyone, > > I was trying to understand if anyone here has

Spark based Data Warehouse

2017-11-11 Thread ashish rawat
Hello Everyone, I was trying to understand if anyone here has tried a data warehouse solution using S3 and Spark SQL. Out of multiple possible options (redshift, presto, hive etc), we were planning to go with Spark SQL, for our aggregates and processing requirements. If anyone has tried it out,