Nothing about Spark depends on a cluster. The Hadoop client libs are
required as they are part of the API but there is no need to remove that if
you aren't using YARN. Indeed you can't but they're just libs.
On Sun, Nov 12, 2017, 9:36 PM wrote:
> @Jörn Spark without Hadoop is
Within in a CI/CD pipeline I use MiniDFSCluster and MiniYarnCluster if the
production cluster has also HDFS and Yarn - it has been proven as extremely
useful and caught a lot of errors before going to the cluster (ie saves a lot
of money).
Cf.
@Jörn Spark without Hadoop is useful
- For using sparks programming model on a single beefy instance
- For testing and integrating with a CI/CD pipeline.
It's ugly to have tests which depend on a cluster running somewhere.
On Sun, 12 Nov 2017 at 17:17 Jörn Franke
Bago,
Finally I am able to create one which fails consistently. I think the issue
is caused by the VectorAssembler in the model. In the new code, I have 2
features(1 text and 1 number) and I have to run through a VectorAssembler
before giving to LogisticRegression. Code and test data below
hey all, i'm finally back from vacation this week and will be following up
once i whittle down my inbox.
in summation: jenkins worker upgrades will be happening. the biggest one
is the move to ubuntu... we need containerized builds for this, but i
don't have the cycles to really do all of this
Hello All,
I have Spark Dataframe with timestamp from 2015-10-07 19:36:59 to
2017-01-01 18:53:23
If i want to split this Dataframe to 3 parts, I wrote below code to split
it. Can anyone please confirm is this correct approach or not ?!
val finalDF1 =
Why do you even mind?
> On 11. Nov 2017, at 18:42, Cristian Lorenzetto
> wrote:
>
> Considering the case i neednt hdfs, it there a way for removing completely
> hadoop from spark?
> Is YARN the unique dependency in spark?
> is there no java or scala (jdk