Re: is there a way for removing hadoop from spark

2017-11-12 Thread Sean Owen
Nothing about Spark depends on a cluster. The Hadoop client libs are required as they are part of the API but there is no need to remove that if you aren't using YARN. Indeed you can't but they're just libs. On Sun, Nov 12, 2017, 9:36 PM wrote: > @Jörn Spark without Hadoop is

Re: is there a way for removing hadoop from spark

2017-11-12 Thread Jörn Franke
Within in a CI/CD pipeline I use MiniDFSCluster and MiniYarnCluster if the production cluster has also HDFS and Yarn - it has been proven as extremely useful and caught a lot of errors before going to the cluster (ie saves a lot of money). Cf.

Re: is there a way for removing hadoop from spark

2017-11-12 Thread trsell
@Jörn Spark without Hadoop is useful - For using sparks programming model on a single beefy instance - For testing and integrating with a CI/CD pipeline. It's ugly to have tests which depend on a cluster running somewhere. On Sun, 12 Nov 2017 at 17:17 Jörn Franke

unsubscribe

2017-11-12 Thread 何琪

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-12 Thread Davis Varghese
Bago, Finally I am able to create one which fails consistently. I think the issue is caused by the VectorAssembler in the model. In the new code, I have 2 features(1 text and 1 number) and I have to run through a VectorAssembler before giving to LogisticRegression. Code and test data below

Re: Jenkins upgrade/Test Parallelization & Containerization

2017-11-12 Thread shane knapp
hey all, i'm finally back from vacation this week and will be following up once i whittle down my inbox. in summation: jenkins worker upgrades will be happening. the biggest one is the move to ubuntu... we need containerized builds for this, but i don't have the cycles to really do all of this

Divide Spark Dataframe to parts by timestamp

2017-11-12 Thread Chetan Khatri
Hello All, I have Spark Dataframe with timestamp from 2015-10-07 19:36:59 to 2017-01-01 18:53:23 If i want to split this Dataframe to 3 parts, I wrote below code to split it. Can anyone please confirm is this correct approach or not ?! val finalDF1 =

Re: is there a way for removing hadoop from spark

2017-11-12 Thread Jörn Franke
Why do you even mind? > On 11. Nov 2017, at 18:42, Cristian Lorenzetto > wrote: > > Considering the case i neednt hdfs, it there a way for removing completely > hadoop from spark? > Is YARN the unique dependency in spark? > is there no java or scala (jdk