Hi.
I have been trying to collect a large dataset(about 2 gb in size, 30
columns, more than a million rows) onto the driver side. I am aware that
collecting such a huge dataset isn't suggested, however, the application
within which the spark driver is running requires that data.
While collecting
Hi All,
I have been trying to serialize a dataframe in protobuf format. So far, I
have been able to serialize every row of the dataframe by using map
function and the logic for serialization within the same(within the lambda
function). The resultant dataframe consists of rows in serialized
Do you configure the same options
> there? Can you share some code?
>
> Am 07.08.2019 um 08:50 schrieb Rishikesh Gawade >:
>
> Hi.
> I am using Spark 2.3.2 and Hive 3.1.0.
> Even if i use parquet files the result would be same, because after all
> sparkSQL isn't able to d
arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 6 Aug 2019 at 07:58, Rishikesh Gawade
> wrote:
>
>> H
Hi.
I have built a Hive external table on top of a directory 'A' which has data
stored in ORC format. This directory has several subdirectories inside it,
each of which contains the actual ORC files.
These subdirectories are actually created by spark jobs which ingest data
from other sources and
To put it simply, what are the configurations that need to be done on the
client machine so that it can run driver on itself and executors on
spark-yarn cluster nodes?
On Mon, Apr 22, 2019, 8:22 PM Rishikesh Gawade
wrote:
> Hi.
> I have been experiencing trouble while trying to c
Hi.
I have been experiencing trouble while trying to connect to a Spark cluster
remotely. This Spark cluster is configured to run using YARN.
Can anyone guide me or provide any step-by-step instructions for connecting
remotely via spark-shell?
Here's the setup that I am using:
The Spark cluster is
Hi.
I wish to use a SparkSession created by one app in another app so that i
can use the dataframes belonging to that session. Is it possible to use the
same sparkSession in another app?
Thanks,
Rishikesh
gest me
the required changes. Also, if it's the case that i might have
misconfigured spark and hive, please suggest me the changes in
configuration, a link guiding through all necessary configs would also be
appreciated.
Thank you in anticipation.
Regards,
Rishikesh Gawade
then please
suggest an ideal way to read Hive tables on Hadoop in Spark using Java. A
link to a webpage having relevant info would also be appreciated.
Thank you in anticipation.
Regards,
Rishikesh Gawade
pache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I request you to please check this and if anything is wrong then please
suggest an ideal way to read Hive tables on Hadoop in Spark using Java. A
link to a webpage having relevant info would also be appreciated.
Thank you in anticipation.
Regards,
Rishikesh Gawade
11 matches
Mail list logo