No Java installed? Or process can but find it? Java-home not set?
On Fri, 13 Nov 2020 at 23:24, Mich Talebzadeh
wrote:
> Hi,
>
> This is basically a simple module
>
> from pyspark import SparkContext
> from pyspark.sql import SQLContext
> from pyspark.sql import HiveContext
> from pyspark.sql
I think Sean is right, but in your argumentation you mention that
'functionality
is sacrificed in favour of the availability of resources'. That's where I
disagree with you but agree with Sean. That is mostly not true.
In your previous posts you also mentioned this . The only reason we
sometimes
19:19, Sean Owen wrote:
>
>> Yes, it's reasonable to build an uber-jar in development, using Maven/Ivy
>> to resolve dependencies (and of course excluding 'provided' dependencies
>> like Spark), and push that to production. That gives you a static artifact
>> to run that does not
ing on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 21 Oct 2020 at 06:34, Wim Van Leuven <
> wim.vanleu...@highestpoint.biz>
Sean,
Problem with the -packages is that in enterprise settings security might
not allow the data environment to link to the internet or even the internal
proxying artefect repository.
Also, wasn't uberjars an antipattern? For some reason I don't like them...
Kind regards
-wim
On Wed, 21 Oct
Hey Mich,
This is a very fair question .. I've seen many data engineering teams start
out with Scala because technically it is the best choice for many given
reasons and basically it is what Spark is.
On the other hand, almost all use cases we see these days are data science
use cases where
Looking at the stack trace, your data from Spark gets serialized to an
ArrayList (of something) whereas in your scala code you are using an Array
of Rows. So, the types don't lign up. That's the exception you are seeing:
the JVM searches for a signature that simply does not exist.
Try to turn the
Srsly?
On Sat, 7 Mar 2020 at 03:28, Koert Kuipers wrote:
> i just ran:
> mvn test -fae > log.txt
>
> at the end of log.txt i find it says there are failures:
> [INFO] Spark Project SQL .. FAILURE [47:55
> min]
>
> that is not very helpful. what tests failed?
>
>
Ok, good luck!
On Mon, 2 Mar 2020 at 10:04, Hamish Whittal
wrote:
> Enrico, Wim (and privately Neil), thanks for the replies. I will give your
> suggestions a whirl.
>
> Basically Wim recommended a pre-processing step to weed out the
> problematic files. I am going to build that into the
Hey Hamish,
I don't think there is 'automatic fix' for this problem ...
Are you reading those as partitions of a single dataset? Or are you
processing them individually?
As apparently, your incoming data is not stable, you should implement a
preprocessing step on each file to check and, if
Hello,
we are writing a lot of data processing pipelines for Spark using pyspark
and add a lot of integration tests.
In our enterprise environment, a lot of people are running Windows PCs and
we notice that build times are really slow on Windows because of the
integration tests. These metrics
11 matches
Mail list logo