We use Impala to access parquet files in the directories. Any pointers on
achieving at least once semantic with spark streaming or partial files ?
Sunil Parmar
On Fri, Mar 2, 2018 at 2:57 PM, Tathagata Das
wrote:
> Structured Streaming's file sink solves these
Hi All
I have a job which processes a large dataset. All items in the dataset are
unrelated. To save on cluster resources, I process these items in
chunks. Since chunks are independent of each other, I start and shut down
the spark context for each chunk. This allows me to keep DAG smaller
Dear All,
i read about higher order function in databricks blog.
https://docs.databricks.com/spark/latest/spark-sql/higher-order-functions-lambda-functions.html
does higher order functionality available in our spark(open source)?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hello,
I am playing with DRA, initially just trying to get a feel for
functionality/limitations & getting the basics to work. Spark is running with
Mesos (in turn on Zookeeper). Spark is version 2.2.0.
I am running this very simple snippet:
Early Bird pricing ends on Friday. Book now to save $200+
Full agenda is available: www.databricks.com/sparkaisummit
Hi, all
I am experiencing some issues in UI when using 2.3
when I clicked executor/storage tab, I got the following exception
java.lang.NullPointerException at
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at
We use sbt for easy cross project dependencies with multiple scala versions
in a mono-repo for which it pretty good albeit with some quirks. As our
projects have matured and change less we moved away from cross project
dependencies but it was extremely useful early in the projects. We knew
that a
Spark uses Maven as the primary build, but SBT works as well. It reads the
Maven build to some extent.
Zinc incremental compilation works with Maven (with the Scala plugin for
Maven).
Myself, I prefer Maven, for some of the reasons it is the main build in
Spark: declarative builds end up being a
I think most of the scala development in Spark happens with sbt - in the open
source world.
However, you can do it with Gradle and Maven as well. It depends on your
organization etc. what is your standard.
Some things might be more cumbersome too reach in non-sbt scala scenarios, but
this is
Hello
SBT's incremental compilation was a huge plus to build spark+scala
applications in SBT for some time. It seems Maven can also support
incremental compilation with Zinc server. Considering that, I am interested
to know communities experience -
1. Spark documentation says SBT is being used
Hello,
We are using spark-jobserver to spawn jobs in Spark cluster. We have
recently faced issues with Zombie jobs in Spark cluster. This normally
happens when the job is accessing some external resources like Kafka/C* and
something goes wrong while consuming them. For example, if suddenly a topic
We are using |RandomForestRegressor| from Spark 2.1.1 to train a model.
To make sure we have the appropriate parameters we start with a very
small dataset, one that has 6024 lines. The regressor is created with
this code:
|val rf = new RandomForestRegressor() .setLabelCol("MyLabel")
12 matches
Mail list logo