Hi all,
we have two environments for spark streaming job, which consumes Kafka
topic to do calculation.
Now in one environment, spark streaming job consume an non-standard data
from kafka and throw an excepiton(not catch it in code), then the
sreaming job is down.
But in another environment,
Unfortunately no. Honestly it does not make sense as for type-aware
operations like map, mapGroups, etc., you have to provide an actual JVM
function. That does not fit in with the SQL language structure.
On Mon, Apr 16, 2018 at 7:34 PM, kant kodali wrote:
> Hi All,
>
> can
unsubscribe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi All,
I have a python file which I am executing directly with spark-submit
command.
Inside the python file, I have sql written using hive context.I created a
generic variable for the database name inside sql
The problem is : How can I pass the value for this variable dynamically
just as we
Hi All,
can we use mapGroupsWithState in raw SQL? or is it in the roadmap?
Thanks!
Hello,
I got a message saying , messages sent to me (my gmail id) from the
mailing list got bounced ?
Wonder why ?
thanks,
Prasad.
On Mon, Apr 16, 2018 at 6:16 PM, wrote:
> Hi! This is the ezmlm program. I'm managing the
> user@spark.apache.org mailing list.
>
>
the spark job succeeds (and with correct output), except there is always an
extra part-* file, and it is empty...
i even set number of partitions to only 2 via spark-submit, but there is
still a 3rd, empty, part-file that shows up.
why does it do that? how to fix?
Thank you
--
Sent
Hi Szuromi,
We manage external shuffle service by Marathon and not manually
sometime though, eg. when adding new node to cluster there is some delay
between mesos schedules tasks on some slave and marathon scheduling external
shuffle service task on this node.
--
Sent from:
Hi Gerard,
"If your actual source is Kafka, the original solution of using
`spark.streams.awaitAnyTermination` should solve the problem."
I tried literally everything, nothing worked out.
1) Tried NC from two different ports for two diff streams, still nothing
worked.
2) Tried same using
Hi Aakash,
First you will want to get the the random forest model stage from the best
pipeline model result, for example if RF is the first stage:
rfModel = model.bestModel.stages[0]
Then you can check the values of the params you tuned like this:
rfModel.getNumTrees
On Mon, Apr 16, 2018 at
Hi,
We have several structured streaming jobs (spark version 2.2.0) consuming from
kafka and writing to s3. They were running fine for a month, since yesterday
few jobs started failing and I see the below exception in the failed jobs log,
```Tried to fetch 473151075 but the returned record
Hi,
I've got a case where the same structured query (it's union) gives 1 stage
for a run and 5 stages for another. I could not find any pattern yet (and
it's hard to reproduce it due to the volume and the application), but I'm
pretty certain that it's *never* possible that Spark 2.3 could come up
Aakash,
There are two issues here.
The issue with the code on the first question is that the first query
blocks and the code for the second does not get executed. Panagiotis
pointed this out correctly.
In the updated code, the issue is related to netcat (nc) and the way
structured streaming
Hi,
Unfortunately no. i just used this lib for FM and FFM raw. I thought it
could be a good baseline for your need.
Regards
Maximilien
On 16/04/18 15:43, Sundeep Kumar Mehta wrote:
Hi Maximilien,
Thanks for your response, Did you convert this repo into DStream for
continuous/incremental
You could have a really large window.
From: Aakash Basu
Date: Monday, April 16, 2018 at 10:56 AM
To: "Lalwani, Jayesh"
Cc: spark receiver , Panagiotis Garefalakis
, user
Hello there,
I am using *spark-2.3.0* compiled using the following Maven Command:
*mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean install.*
I have configured it to run with *Hive v2.3.3*. Also, all the Hive related
jars (*v1.2.1*) in the Spark's JAR folder have been replaced by all the
If I use timestamp based windowing, then my average will not be global
average but grouped by timestamp, which is not my requirement. I want to
recalculate the avg of entire column, every time a new row(s) comes in and
divide the other column with the updated avg.
Let me know, in-case you or
Change you table name in query to spam.spamdataset instead of spamdataset.
On Sun, Apr 15, 2018 at 2:12 PM Rishikesh Gawade
wrote:
> Hello there. I am a newbie in the world of Spark. I have been working on a
> Spark Project using Java.
> I have configured Hive and
You could do it if you had a timestamp in your data. You can use windowed
operations to divide a value by it’s own average over a window. However, in
structured streaming, you can only window by timestamp columns. You cannot do
windows aggregations on integers.
From: Aakash Basu
Hi Maximilien,
Thanks for your response, Did you convert this repo into DStream for
continuous/incremental training ?
Regards
Sundeep
On Mon, Apr 16, 2018 at 4:17 PM, Maximilien DEFOURNE <
maximilien.defou...@s4m.io> wrote:
> Hi,
>
> I used this repo for FM/FFM : https://github.com/Intel-
>
Hi,
I used this repo for FM/FFM : https://github.com/Intel-bigdata/imllib-spark
Regards
Maximilien DEFOURNE
On 15/04/18 05:14, Sundeep Kumar Mehta wrote:
Hi All,
Any library/ github project to use factorization machine or field
aware factorization machine via online learning for
Hey Jayesh and Others,
Is there then, any other way to come to a solution for this use-case?
Thanks,
Aakash.
On Mon, Apr 16, 2018 at 8:11 AM, Lalwani, Jayesh <
jayesh.lalw...@capitalone.com> wrote:
> Note that what you are trying to do here is join a streaming data frame
> with an aggregated
Thank you so much TD, Matt, Anirudh and Oz,
Really appropriate this.
On Fri, Apr 13, 2018 at 9:54 PM, Oz Ben-Ami wrote:
> I can confirm that Structured Streaming works on Kubernetes, though we're
> not quite on production with that yet. Issues we're looking at are:
> -
23 matches
Mail list logo