Yes. Yes you can.
On Tue, Jul 17, 2018 at 11:42 AM, Sathi Chowdhury wrote:
> Hi,
> My question is about ability to integrate spark streaming with multiple
> clusters.Is it a supported use case. An example of that is that two topics
> owned by different group and they have their own kakka infra
Just found out that I need following option while reading:
.option("basePath", "hdfs://localhost:9000/ptest/")
https://stackoverflow.com/questions/43192940/why-is-partition-key-column-missing-from-dataframe
On Tue, Jul 17, 2018 at 3:48 PM, Nirav Patel wrote:
> I created a hive table with
I created a hive table with parquet storage using sparkSql. Now in hive cli
when I do describe and Select I can see partition columns in both as
regular columns as well as partition column. However if I try to do same in
sparkSql (Dataframe) I don't see partition columns.
I need to do projection
Thanks 0xF0F0F0 and Ashutosh for the pointers.
Holden,
I am trying to look into sparklingml...what am I looking for? Also which
chapter/page of your book should I look at?
Mohit.
On Sun, Jul 15, 2018 at 3:02 AM Holden Karau wrote:
> If you want to see some examples in a library shows a way to
perhaps this is https://issues.apache.org/jira/browse/SPARK-24578?
that was reported as a performance issue, not OOMs, but its in the exact
same part of the code and the change was to reduce the memory pressure
significantly.
On Mon, Jul 16, 2018 at 1:43 PM, Bryan Jeffrey
wrote:
> Hello.
>
> I
Hi,
My question is about ability to integrate spark streaming with multiple
clusters.Is it a supported use case. An example of that is that two topics
owned by different group and they have their own kakka infra .
Can i have two dataframes as a result of spark.readstream listening to
different
this may work
val df_post= listCustomCols
.foldLeft(df_pre){(tempDF, listValue) =>
tempDF.withColumn(
listValue.name,
new Column(listValue.name.toString + funcUDF(listValue.name))
)
and outsource the renaming to an udf
or you can rename the
Hi,My question is about ability to integrate spark streaming with multiple
clusters.Is it a supported use case. An example of that is that two topics
owned by different group and they have their own kakka infra .Can i have two
dataframes as a result of spark.readstream listening to different
Hi,My question is about ability to integrate spark streaming with multiple
clusters.Is it a supported use case. An example of that is that two topics
owned by different group and they have their own kakka infra .Can i have two
dataframes as a result of spark.readstream listening to different
Hi,
This is a very general question. It's hard to andswer your question without
fully understanding your business and technological needs.
You might want to watch this video:
https://www.youtube.com/watch?v=2UKSLHDH5vc=8s
Shmuel
On Tue, Jul 17, 2018 at 12:11 AM Gautam Singaraju <
30G user data, how to get distinct users count after creating a composite key
based on company and userid?
在 2018-07-13 18:24:52,Jean Georges Perrin 写道:
Just thinking out loud… repartition by key? create a composite key based on
company and userid?
How big is your dataset?
On Jul 13, 2018,
Hi guys,
I'm trying to profile my Spark code on cProfiler and check where more time
is taken. I found the most time taken is by some socket object, which I'm
quite clueless of, as to where it is used.
Can anyone shed some light on this?
12 matches
Mail list logo