Hi All,
How can we set spark conf parameter in databricks notebook? My cluster
doesn't take into account any spark.conf.set properties... it creates 8
worker nodes (dat executors) but doesn't honor the supplied conf
parameters. Any idea?
--
Regards,
Rishi Shah
Hi All,
At times when there's a data node failure, running spark job doesn't fail -
it gets stuck and doesn't return. Any setting can help here? I would
ideally like to get the job terminated or executors running on those data
nodes fail...
--
Regards,
Rishi Shah
Pyarrow version 0.12.1, arrow jar version 0.10, can run correctly. Pyarrow
version 0.121, arrow jar version 0.12, this exception occurs:
> *Expected schema message in stream, was null or length 0*
Pyarrow was upgraded to version 0.12.1 by other package dependency,
resulting in inconsistency
This has been fixed and was included in the release 0.3 last week. We will be
making another release (0.4) in the next 24 hours to include more features also.
On Tue, Apr 30, 2019 at 12:42 AM, Manu Zhang < owenzhang1...@gmail.com > wrote:
>
> Hi,
>
>
> It seems koalas.DataFrame can't be
Thanks for reply, Jean
In my project , I'm working on higher abstraction layer of spark
streaming to build a data processing product and trying to provide a common
api for java and scala developers.
You can see the abstract class defined here:
The slides for the talks have been uploaded to
https://analytics-zoo.github.io/master/#presentations/.
Thanks,
-Jason
On Fri, Apr 19, 2019 at 9:55 PM Khare, Ankit wrote:
> Thanks for sharing.
>
> Sent from my iPhone
>
> On 19. Apr 2019, at 01:35, Jason Dai wrote:
>
> Hi all,
>
>
> Please see
Hi Jacek,
Thanks for your reply. Your provided case was actually same as my second
option in my original email. What I'm wondering was the difference between
those two regarding query performance or efficiency.
On Tue, May 14, 2019 at 3:51 PM Jacek Laskowski wrote:
> Hi,
>
> For this
Hi,
For this particular case I'd use Column.substr (
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column),
e.g.
val ns = Seq(("hello world", 1, 5)).toDF("w", "b", "e")
scala> ns.select($"w".substr($"b", $"e" - $"b" + 1) as "demo").show
+-+
| demo|
+-+
Hi Suket, Anastasios
Many thanks for your time and your suggestions!
I tried again with various settings for the watermarks and the trigger time
- watermark 20sec, trigger 2sec
- watermark 10sec, trigger 1sec
- watermark 20sec, trigger 0sec
I also tried continuous processing mode, but since I
// Continuous trigger with one-second checkpointing intervaldf.writeStream
.format("console")
.trigger(Trigger.Continuous("1 second"))
.start()
On Tue, 14 May 2019 at 22:10, suket arora wrote:
> Hey Austin,
>
> If you truly want to process as a stream, use continuous streaming in
>
df = inputStream.withWatermark("eventtime", "20
seconds").groupBy("sharedId", window("20 seconds", "10 seconds")
// ProcessingTime trigger with two-seconds micro-batch interval
df.writeStream
.format("console")
.trigger(Trigger.ProcessingTime("2 seconds"))
.start()
On Tue, 14 May
For example, I have a dataframe with 3 columns: URL, START, END. For each
url from URL column, I want to fetch a substring of it starting from START
and ending at END.
++--+-+
|URL|START |END |
++--+-+
There are a little bit more than the list you specified nevertheless, some
data types are not directly compatible between Scala and Java and requires
conversion, so it’s good to not pollute your code with plenty of conversion and
focus on using the straight API.
I don’t remember from the top
Hi Anastasios
On 5/14/19 4:15 PM, Anastasios Zouzias wrote:
> Hi Joe,
>
> How often do you trigger your mini-batch? Maybe you can specify the trigger
> time explicitly to a low value or even better set it off.
>
> See:
>
Hi Suket
Sorry, this was a typo in the pseudo-code I sent. Of course that what you
suggested (using the same eventtime attribute for the watermark and the window)
is what my code does in reality. Sorry, to confuse people.
On 5/14/19 4:14 PM, suket arora wrote:
> Hi Joe,
> As per the spark
Hi all,
I am wondering why do we need Java-Friendly APIs in Spark ? Why can't we
just use scala apis in java codes ? What's the difference ?
Some examples of Java-Friendly APIs commented in Spark code are as follows:
JavaDStream
JavaInputDStream
JavaStreamingContext
JavaSparkContext
Hi Joe,
How often do you trigger your mini-batch? Maybe you can specify the trigger
time explicitly to a low value or even better set it off.
See:
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
Best,
Anastasios
On Tue, May 14, 2019 at 3:49 PM Joe
Hi Joe,
As per the spark structured streaming documentation and I quote
"withWatermark must be called on the same column as the timestamp column
used in the aggregate. For example, df.withWatermark("time", "1
min").groupBy("time2").count() is invalid in Append output mode, as
watermark is defined
Hi all
I'm fairly new to Spark structured streaming and I'm only starting to develop
an understanding for the watermark handling.
Our application reads data from a Kafka input topic and as one of the first
steps, it has to group incoming messages. Those messages come in bulks, e.g. 5
messages
19 matches
Mail list logo