Sorry I missed the original channel, added it back.
-
I have less knowledge about dbt. If it supports Hive, it should support Kyuubi.
Basically, Kyuubi is gateway between your client(e.g. beeline, hive
jdbc client) and compute engine(e.g. Spark, Flink, Trino), I think the
most valuable
It can be used as warehouse but then you have to keep long running spark
jobs.
This can be possible using cached data frames or dataset .
Thanks
Deepak
On Sat, 26 Mar 2022 at 5:56 AM, wrote:
> In the past time we have been using hive for building the data
> warehouse.
> Do you think if spark
In the past time we have been using hive for building the data
warehouse.
Do you think if spark can used for this purpose? it's even more realtime
than hive.
Thanks.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
We are using Spark on Kubernetes on AWS (it's a long story) but it does
work. It's still on the raw side but we've been pretty successful.
We configured our cluster primarily with Kube-AWS and auto scaling groups.
There are gotcha's there, but so far we've been quite successful.
Gary Lucas
On
Thanks everyone for their suggestions. Does any of you take care of auto
scale up and down of your underlying spark clusters on AWS?
On Nov 14, 2017 10:46 AM, "lucas.g...@gmail.com"
wrote:
Hi Ashish, bear in mind that EMR has some additional tooling available that
smoothes
Hi Ashish, bear in mind that EMR has some additional tooling available that
smoothes out some S3 problems that you may / almost certainly will
encounter.
We are using Spark / S3 not on EMR and have encountered issues with file
consistency, you can deal with it but be aware it's additional
Another option that we are trying internally is to uses Mesos for isolating
different jobs or groups. Within a single group, using Livy to create
different spark contexts also works.
- Affan
On Tue, Nov 14, 2017 at 8:43 AM, ashish rawat wrote:
> Thanks Sky Yin. This really
Thanks Sky Yin. This really helps.
On Nov 14, 2017 12:11 AM, "Sky Yin" wrote:
We are running Spark in AWS EMR as data warehouse. All data are in S3 and
metadata in Hive metastore.
We have internal tools to creat juypter notebook on the dev cluster. I
guess you can use
We are running Spark in AWS EMR as data warehouse. All data are in S3 and
metadata in Hive metastore.
We have internal tools to creat juypter notebook on the dev cluster. I
guess you can use zeppelin instead, or Livy?
We run genie as a job server for the prod cluster, so users have to submit
> livy-a-rest-interface-for-apache-spark/
>>
>>
>>
>> Livy Server will allow you to create multiple spark contexts via REST:
>> https://livy.incubator.apache.org/
>>
>>
>>
>> If you are looking for broad SQL functionality I’d recommend
>> instantiat
ou to create multiple spark contexts via REST:
> https://livy.incubator.apache.org/
>
>
>
> If you are looking for broad SQL functionality I’d recommend instantiating
> a Hive context. And Spark is able to spill to disk à
> https://spark.apache.org/faq.html
>
>
>
> Th
ies running spark within their data warehouse
solutions:
https://ibmdatawarehousing.wordpress.com/2016/10/12/steinbach_dashdb_local_spark/
Edmunds used Spark to allow business analysts to point Spark to files in S3 and
infer schema: https://www.youtube.com/watch?v=gsR1ljgZLq0
Recommend running s
any is using
>> the cluster at a given moment in time.
>>
>> Does this help?
>>
>> Regards,
>>
>> Phillip
>>
>>
>> On Sun, Nov 12, 2017 at 5:50 PM, ashish rawat <dceash...@gmail.com>
>> wrote:
>>
>>> Thanks Jorn and
but this can happen irrespective of whether one user or many is using
> the cluster at a given moment in time.
>
> Does this help?
>
> Regards,
>
> Phillip
>
>
> On Sun, Nov 12, 2017 at 5:50 PM, ashish rawat <dceash...@gmail.com> wrote:
>
>> Thanks Jorn and Philli
hillip. My question was specifically to anyone who have
> tried creating a system using spark SQL, as Data Warehouse. I was trying to
> check, if someone has tried it and they can help with the kind of workloads
> which worked and the ones, which have problems.
>
> Regarding spill to
Thanks Jorn and Phillip. My question was specifically to anyone who have
tried creating a system using spark SQL, as Data Warehouse. I was trying to
check, if someone has tried it and they can help with the kind of workloads
which worked and the ones, which have problems.
Regarding spill to disk
Agree with Jorn. The answer is: it depends.
In the past, I've worked with data scientists who are happy to use the
Spark CLI. Again, the answer is "it depends" (in this case, on the skills
of your customers).
Regarding sharing resources, different teams were limited to their own
queue so they
What do you mean all possible workloads?
You cannot prepare any system to do all possible processing.
We do not know the requirements of your data scientists now or in the future so
it is difficult to say. How do they work currently without the new solution? Do
they all work on the same data? I
I am looking for similar solution more aligned to data scientist group.
The concern i have is about supporting complex aggregations at runtime .
Thanks
Deepak
On Nov 12, 2017 12:51, "ashish rawat" wrote:
> Hello Everyone,
>
> I was trying to understand if anyone here has
Hello Everyone,
I was trying to understand if anyone here has tried a data warehouse
solution using S3 and Spark SQL. Out of multiple possible options
(redshift, presto, hive etc), we were planning to go with Spark SQL, for
our aggregates and processing requirements.
If anyone has tried it out,
20 matches
Mail list logo