Re: Jdbc Hook in Spark Batch Application

2020-12-24 Thread lec ssmi
Thanks. But there is a problem that the classes referenced in the code need to be modified. I want to try not to change the existing code. Gabor Somogyi 于2020年12月25日周五 上午12:16写道: > One can wrap the JDBC driver and such a way eveything can be sniffed. > > On Thu, 24 Dec 2020, 03:51

Jdbc Hook in Spark Batch Application

2020-12-23 Thread lec ssmi
Hi: guys, I have some spark programs that have database connection operations. I want to acquire the connection information, such as jdbc connection properties , but not too intrusive to the code. Any good ideas ? Can java agent make it ?

Re: [Spark Structured Streaming] Not working while worker node is on different machine

2020-12-23 Thread lec ssmi
Any more detail about it ? bannya 于2020年12月18日周五 上午11:25写道: > Hi, > > I have a spark structured streaming application that is reading data from a > Kafka topic (16 partitions). I am using standalone mode. I have two workers > node, one node is on the same machine with masters and another one is

Re: Printing Logs in map-partition

2020-12-22 Thread lec ssmi
the logs printed in the map function exist in the worker node, you can access it directly, or you can browse through webui. abby37 于2020年12月23日周三 下午1:53写道: > I want to print some logs in transformation mapPartitions to logs the > internal working of function. > I have used following

Re: mysql connector java issue

2020-12-10 Thread lec ssmi
If you can not assembly the jdbc driver jar in your application jar package, you can put the jdbc driver jar in the spark classpath, generally, $SPARK_HOME/jars or $SPARK_HOME/lib. Artemis User 于2020年12月11日周五 上午5:21写道: > What happened was that you made the mysql jar file only available to the

Re: Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread lec ssmi
you can use *foreach* sink to achieve the logic you want. act_coder 于2020年11月4日周三 下午9:56写道: > I have a scenario where I would like to save the same streaming dataframe > to > two different streaming sinks. > > I have created a streaming dataframe which I need to send to both Kafka > topic and

Re: MongoDB plugin to Spark - too many open cursors

2020-10-25 Thread lec ssmi
Is the connection pool configured by mongodb full? Daniel Stojanov 于2020年10月26日周一 上午10:28写道: > Hi, > > > I receive an error message from the MongoDB server if there are too many > Spark applications trying to access the database at the same time (about > 3 or 4), "Cannot open a new cursor since

Re: Spark Structured streaming - Kakfa - slowness with query 0

2020-10-20 Thread lec ssmi
transaction bache size. KhajaAsmath Mohammed 于2020年10月21日周三 下午12:19写道: > Yes. Changing back to latest worked but I still see the slowness compared > to flume. > > Sent from my iPhone > > On Oct 20, 2020, at 10:21 PM, lec ssmi wrote: > >  > Do you start your application wi

Re: Spark Structured streaming - Kakfa - slowness with query 0

2020-10-20 Thread lec ssmi
Do you start your application with chasing the early Kafka data ? Lalwani, Jayesh 于2020年10月21日周三 上午2:19写道: > Are you getting any output? Streaming jobs typically run forever, and keep > processing data as it comes in the input. If a streaming job is working > well, it will typically generate

Re: how to disable replace HDFS checkpoint location in structured streaming in spark3.0.1

2020-10-13 Thread lec ssmi
sorry, the mail title is a little problematic. "How to disable or replace .." lec ssmi 于2020年10月14日周三 上午9:27写道: > I have written a demo using spark3.0.0, and the location where the > checkpoint file is saved has been explicitly specified like >> >> strea

how to disable replace HDFS checkpoint location in structured streaming in spark3.0.1

2020-10-13 Thread lec ssmi
I have written a demo using spark3.0.0, and the location where the checkpoint file is saved has been explicitly specified like > > stream.option("checkpointLocation","file:///C:\\Users\\Administrator\\ > Desktop\\test") But the app still throws an exception about the HDFS file system. Is

Re: [Structured Streaminig] multiple queries in one application

2020-05-03 Thread lec ssmi
For example, put the generated query into a list and start every one, then use the method awaitTermination() on the last one . Abhisheks 于2020年5月1日周五 上午10:32写道: > I hope you are using the Query object that is returned by the Structured > streaming, right? > Returned object contains a lot of

[Structured Streaminig] multiple queries in one application

2020-04-29 Thread lec ssmi
for this situation? Best Lec Ssmi

Re: [Structured Streaming] NullPointerException in long running query

2020-04-28 Thread lec ssmi
pr 28, 2020 at 9:25, Jungtaek Lim > wrote: > The root cause of exception is occurred in executor side "Lost task 10.3 > in stage 1.0 (TID 81, spark6, executor 1)" so you may need to check there. > > On Tue, Apr 28, 2020 at 2:52 PM lec ssmi wrote: > > Hi: > On

[Structured Streaming] NullPointerException in long running query

2020-04-27 Thread lec ssmi
on$streaming$StreamExecution$$runStream(StreamExecution.scala:279) > ... 1 more According to the exception stack, it seems to have nothing to do with the logic of my code.Is this a spark bug or something? The version of spark is 2.3.1. Best Lec Ssmi

Re: structured streaming Kafka consumer group.id override

2020-03-19 Thread lec ssmi
as been expired. What is the solution for this? > > On Thu, Mar 19, 2020 at 10:53 AM lec ssmi wrote: > >> 1.Maybe we can't use customized group id in structured streaming. >> 2.When restarting from failure or killing , the group id changes, but the >> starting off

Re: structured streaming Kafka consumer group.id override

2020-03-18 Thread lec ssmi
1.Maybe we can't use customized group id in structured streaming. 2.When restarting from failure or killing , the group id changes, but the starting offset will be the last one you consumed last time . Srinivas V 于2020年3月19日周四 下午12:36写道: > Hello, > 1. My Kafka consumer name is randomly being

Re: Spark Streaming with mapGroupsWithState

2020-03-02 Thread lec ssmi
maybe you can combine the fields you want to use into one field Something Something 于2020年3月3日周二 上午6:37写道: > I am writing a Stateful Streaming application in which I am using > mapGroupsWithState to create aggregates for Groups but I need to create > *Groups > based on more than one column in

Re: dropDuplicates and watermark in structured streaming

2020-02-27 Thread lec ssmi
Such as : df.withWarmark("time","window size").dropDulplicates("id").withWatermark("time","real watermark").groupBy(window("time","window size","window size")).agg(count("id&

Re: dropDuplicates and watermark in structured streaming

2020-02-27 Thread lec ssmi
ake count(distinct count) success? Tathagata Das 于2020年2月28日周五 上午10:25写道: > 1. Yes. All times in event time, not processing time. So you may get 10AM > event time data at 11AM processing time, but it will still be compared > again all data within 9-10AM event times. > > 2. Show

dropDuplicates and watermark in structured streaming

2020-02-27 Thread lec ssmi
Hi: I'm new to structured streaming. Because the built-in API cannot perform the Count Distinct operation of Window, I want to use dropDuplicates first, and then perform the window count. But in the process of using, there are two problems: 1. Because it is streaming computing,