,
Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
On Tue, Feb 25, 2020 at 7:46 PM Ruijing Li wrote:
>
> Just wanted to follow up on this. If anyone has any advice, I’d be interested
> in learning more!
>
> On Th
on the subject. Slides and video
are linked on this page: http://www.mapflat.com/presentations/
You can find more material in this list of resources:
http://www.mapflat.com/lands/resources/reading-list
Happy testing!
Regards,
Lars Albertsson
Data engineering entrepreneur
www.mimeria.com
in this list of resources:
http://www.mapflat.com/lands/resources/reading-list
Happy testing!
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: http://www.mapflat.com/calendar
On Mon, May 21, 2018 at 2:24 PM, Steve
. Validate
selected fields instead.
For a longer answer, please search for my previous posts to the user
list, or watch this presentation: https://vimeo.com/192429554
Slides at
http://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458
Regards,
Lars Albertsson
Data
.
Or do you want to use DI for other reasons?
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS, https://freebusy.io/la...@mapflat.com
On Fri, Dec 23, 2016 at 11:56 AM, Chetan Khatri
<chetan.opensou...@gmail.
on the subject. There is a video
recording at https://vimeo.com/192429554 and slides at
http://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458
You can find more material on test strategies at
http://www.mapflat.com/lands/resources/reading-list/index.html
Lars Albertsson
be
addressed; if you induce failures, system failures would become part
of normal operations, and real failures risk passing unnoticed.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS
On Thu, Jul 28
You can find useful discussions in the list archives. I wrote this, which
might help you:
https://www.mail-archive.com/user%40spark.apache.org/msg48032.html
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
Calendar: https://goo.gl/tV2hWF
On Jun 29, 2016 07:02
ading-list/
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS
On Wed, Jul 20, 2016 at 3:47 PM, Sathish Kumaran Vairavelu
<vsathishkuma...@gmail.com> wrote:
> If you are using Mesos, then u can use C
You can use a workflow manager, which gives you tools to handle transient
failures in data pipelines. I suggest either Luigi or Airflow. They provide
DSLs embedded in Python, so if the primitives provided are insufficient, it
is easy to customise Spark tasks with restart logic.
Regards,
Lars
that runs smoothly from Gradle/Maven/SBT and also from
IntelliJ.
I hope things are clearer. Let me know if you have further questions.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
Calendar: https://goo.gl/6FBtlS
On Thu, Jul 7, 2016 at 3:14 AM,
with Docker Compose. If you are emitting
database entries, your test oracle will need to frequently poll the
database for the expected records, with a timeout in order not to hang
on failing tests.
I hope this is comprehensible. Let me know if you have followup questions.
Regards,
Lars Alberts
gt;
> On Wed, Mar 30, 2016 at 2:41 AM, Lars Albertsson <la...@mapflat.com>
> wrote:
>
>> Thanks!
>>
>> It is on my backlog to write a couple of blog posts on the topic, and
>> eventually some example code, but I am currently busy with clients.
>>
>&g
Hi,
I wrote a longish mail on Spark testing strategy last month, which you
may find useful:
http://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/browser
Let me know if you have follow up questions or want assistance.
Regards,
Lars Albertsson
Data engineering consultant
Thanks!
It is on my backlog to write a couple of blog posts on the topic, and
eventually some example code, but I am currently busy with clients.
Thanks for the pointer to Eventually - I was unaware. Fast exit on
exception would be a useful addition, indeed.
Lars Albertsson
Data engineering
with some expiration
strategy. :-)
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Fri, Mar 25, 2016 at 7:48 AM, Jatin Kumar <jku...@rocketfuelinc.com> wrote:
> Hello Lars,
>
> Thanks for your email. I tried exactly what you said and it d
case, the data structures ended up being small, on the
order ot tens or hundreds of megabytes. It varies with use case, but
it is probably a path worth investigating if approximate results are
acceptable.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Wed
with different combinations of
time windows by pushing out CMSs and heavy hitters to e.g. Kafka, and
have different stream processors that aggregate different time windows
and push results to Kafka or to lookup tables.
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Tue, Mar
presentation by Ted Dunning and Mikio Braun,
who have held good presentations on the subject.
There are AFAIK two open source implementations of Count-Min Sketch,
one of them in Algebird.
Let me know if anything is unclear.
Good luck, and let us know how it goes.
Regards,
Lars Albertss
clarifications or assistance.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Wed, Mar 2, 2016 at 6:54 PM, SRK <swethakasire...@gmail.com> wrote:
> Hi,
>
> What is a good unit testing framework for Spark batch/streaming jobs? I have
> cor
.
Let me know if you have follow-up questions, or want assistance.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Tue, Jan 26, 2016 at 10:25 PM, Daniel Schulz
<danielschulz2...@hotmail.com> wrote:
> Hi,
>
> We are currently working on a soluti
the results with the list!
Regards,
Lars Albertsson
On Thu, Oct 22, 2015 at 10:48 PM, Nipun Arora <nipunarora2...@gmail.com> wrote:
> Hi,
> In general in spark stream one can do transformations ( filter, map etc.) or
> output operations (collect, forEach) etc. in an event-driven
? That will require
additional components.
This became a bit of a brain dump on the topic. I hope that it is
useful. Don't hesitate to get back if I can help.
Regards,
Lars Albertsson
On Fri, Aug 7, 2015 at 5:43 PM, Vikram Kone vikramk...@gmail.com wrote:
Hi,
I'm looking for open source workflow tools
The snippet at the end worked for me. We run Spark 1.3.x, so
DataFrame.drop is not available to us.
As pointed out by Yana, DataFrame operations typically return a new
DataFrame, so use as such:
import com.foo.sparkstuff.DataFrameOps._
...
val df = ...
val prunedDf = df.dropColumns(one_col,
24 matches
Mail list logo