lso.
>
> On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer wrote:
>
>> Are you a data scientist or data engineer?
>>
>>
>> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg wrote:
>>
>>> Hi,
>>>
>>> I am new Spark learner. Can someone guide me with the strategy towards
>>> getting expertise in PySpark.
>>>
>>> Thanks!!!
>>>
>>
Are you a data scientist or data engineer?
On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg wrote:
> Hi,
>
> I am new Spark learner. Can someone guide me with the strategy towards
> getting expertise in PySpark.
>
> Thanks!!!
>
Is there a reason why case classes won't work for your use case?
On Sun, Jan 6, 2019 at 10:43 PM wrote:
> Hi ,
>
>
>
> Is it possible to return a custom class from an UDF other than a case
> class?
>
>
>
> If so , how can we avoid this exception ? :
> java.lang.UnsupportedOperationException: Sch
Try running AnalyzeTableCommand on both tables first.
On Wed, Apr 18, 2018 at 2:57 AM Matteo Cossu wrote:
> Can you check the value for spark.sql.autoBroadcastJoinThreshold?
>
> On 29 March 2018 at 14:41, Vitaliy Pisarev
> wrote:
>
>> I am looking at the physical plan for the following query:
>
If no pre-built solution exists, writing your own would not be that
difficult. I suggest looking at a parser combinator such as FastParse to
create your own.
http://www.lihaoyi.com/fastparse/
Regards,
Kurt
On Tue, Mar 13, 2018 at 7:47 AM Aakash Basu
wrote:
> Thanks again for the detailed expla
Hi Kane,
It really depends on your use case. I generally use Parquet because it
seems to have better support beyond Spark. However, if you are dealing with
partitioned Hive tables, the current versions of Spark have an issue where
compression will not be applied. This will be fixed in version 2.3.
Can you share your code and a sample of your data? WIthout seeing it, I
can't give a definitive answer. I can offer some hints. If you have a
column of strings you should either be able to create a new column casted
to Integer. This can be accomplished two ways:
df.withColumn("newColumn", df.curre