Re: [Beginner] How to save Kafka Dstream data to parquet ?

2018-03-05 Thread Sunil Parmar
We use Impala to access parquet files in the directories. Any pointers on achieving at least once semantic with spark streaming or partial files ? Sunil Parmar On Fri, Mar 2, 2018 at 2:57 PM, Tathagata Das wrote: > Structured Streaming's file sink solves these

OutOfDirectMemoryError for Spark 2.2

2018-03-05 Thread Chawla,Sumit
Hi All I have a job which processes a large dataset. All items in the dataset are unrelated. To save on cluster resources, I process these items in chunks. Since chunks are independent of each other, I start and shut down the spark context for each chunk. This allows me to keep DAG smaller

Spark Higher order function

2018-03-05 Thread Selvam Raman
Dear All, i read about higher order function in databricks blog. https://docs.databricks.com/spark/latest/spark-sql/higher-order-functions-lambda-functions.html does higher order functionality available in our spark(open source)? -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Dynamic Resource Allocation - session stuck

2018-03-05 Thread Marinov, Slavi (London)
Hello, I am playing with DRA, initially just trying to get a feel for functionality/limitations & getting the basics to work. Spark is running with Mesos (in turn on Zookeeper). Spark is version 2.2.0. I am running this very simple snippet:

Spark+AI Summit 2018 - San Francisco June 4-6, 2018

2018-03-05 Thread Scott walent
Early Bird pricing ends on Friday. Book now to save $200+ Full agenda is available: www.databricks.com/sparkaisummit

broken UI in 2.3?

2018-03-05 Thread Nan Zhu
Hi, all I am experiencing some issues in UI when using 2.3 when I clicked executor/storage tab, I got the following exception java.lang.NullPointerException at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at

Re: Spark scala development in Sbt vs Maven

2018-03-05 Thread Anthony May
We use sbt for easy cross project dependencies with multiple scala versions in a mono-repo for which it pretty good albeit with some quirks. As our projects have matured and change less we moved away from cross project dependencies but it was extremely useful early in the projects. We knew that a

Re: Spark scala development in Sbt vs Maven

2018-03-05 Thread Sean Owen
Spark uses Maven as the primary build, but SBT works as well. It reads the Maven build to some extent. Zinc incremental compilation works with Maven (with the Scala plugin for Maven). Myself, I prefer Maven, for some of the reasons it is the main build in Spark: declarative builds end up being a

Re: Spark scala development in Sbt vs Maven

2018-03-05 Thread Jörn Franke
I think most of the scala development in Spark happens with sbt - in the open source world. However, you can do it with Gradle and Maven as well. It depends on your organization etc. what is your standard. Some things might be more cumbersome too reach in non-sbt scala scenarios, but this is

Spark scala development in Sbt vs Maven

2018-03-05 Thread Swapnil Shinde
Hello SBT's incremental compilation was a huge plus to build spark+scala applications in SBT for some time. It seems Maven can also support incremental compilation with Zinc server. Considering that, I am interested to know communities experience - 1. Spark documentation says SBT is being used

Properly stop applications or jobs within the application

2018-03-05 Thread Behroz Sikander
Hello, We are using spark-jobserver to spawn jobs in Spark cluster. We have recently faced issues with Zombie jobs in Spark cluster. This normally happens when the job is accessing some external resources like Kafka/C* and something goes wrong while consuming them. For example, if suddenly a topic

[ML] RandomForestRegressor training set size for each trees

2018-03-05 Thread OBones
We are using |RandomForestRegressor| from Spark 2.1.1 to train a model. To make sure we have the appropriate parameters we start with a very small dataset, one that has 6024 lines. The regressor is created with this code: |val rf = new RandomForestRegressor() .setLabelCol("MyLabel")