Re: Release Apache Spark 2.4.4

2019-08-13 Thread Terry Kim
Can the following be included? [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in EpochTracker (to support Python UDFs) Thanks, Terry On Tue, Aug 13, 2019 at 10:24 PM Wenchen Fan wrote: > +1 > > On Wed, Aug 14, 2019 at 12:52

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Wenchen Fan
+1 On Wed, Aug 14, 2019 at 12:52 PM Holden Karau wrote: > +1 > Does anyone have any critical fixes they’d like to see in 2.4.4? > > On Tue, Aug 13, 2019 at 5:22 PM Sean Owen wrote: > >> Seems fine to me if there are enough valuable fixes to justify another >> release. If there are any other

Re: Any advice how to do this usecase in spark sql ?

2019-08-13 Thread Jörn Franke
Have you tried to join both datasets, filter accordingly and then write the full dataset to your filesystem? Alternatively work with a NoSQL database that you update by key (eg it sounds a key/value store could be useful for you). However, it could be also that you need to do more depending on

Re: Continuous processing mode and python udf

2019-08-13 Thread Hyukjin Kwon
that's fixed in https://github.com/apache/spark/commit/b83b7927b3a85c1a4945e2224ed811b5bb804477 2019년 8월 13일 (화) 오후 12:37, zenglong chen 님이 작성: > Does Spark 2.4.0 support Python UDFs with Continuous Processing mode? > I try it and occur error like below: > WARN scheduler.TaskSetManager:

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Holden Karau
+1 Does anyone have any critical fixes they’d like to see in 2.4.4? On Tue, Aug 13, 2019 at 5:22 PM Sean Owen wrote: > Seems fine to me if there are enough valuable fixes to justify another > release. If there are any other important fixes imminent, it's fine to > wait for those. > > > On Tue,

RE: Release Apache Spark 2.4.4

2019-08-13 Thread Kazuaki Ishizaki
Thanks, Dongjoon! +1 Kazuaki Ishizaki, From: Hyukjin Kwon To: Takeshi Yamamuro Cc: Dongjoon Hyun , dev , User Date: 2019/08/14 09:21 Subject:[EXTERNAL] Re: Release Apache Spark 2.4.4 +1 2019년 8월 14일 (수) 오전 9:13, Takeshi Yamamuro 님 이 작성: Hi, Thanks for your

Any advice how to do this usecase in spark sql ?

2019-08-13 Thread Shyam P
Hi, Any advice how to do this in spark sql ? I have a scenario as below dataframe1 = loaded from an HDFS Parquet file. dataframe2 = read from a Kafka Stream. If column1 of dataframe1 value in columnX value of dataframe2 , then I need then I need to replace column1 value of dataframe1.

Re: Sreate temp table or view by using location

2019-08-13 Thread 霍战锋
It works by using "create temp view" and "options(path='something')" together, thanks. spark.sql("""create temp view people (name string, age int) using csv options(sep=',',inferSchema='true',ignoreLeadingWhiteSpace='true',path='src/main/resources/people.txt')""") Best Regards 霍战锋

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Sean Owen
Seems fine to me if there are enough valuable fixes to justify another release. If there are any other important fixes imminent, it's fine to wait for those. On Tue, Aug 13, 2019 at 6:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Hyukjin Kwon
+1 2019년 8월 14일 (수) 오전 9:13, Takeshi Yamamuro 님이 작성: > Hi, > > Thanks for your notification, Dongjoon! > I put some links for the other committers/PMCs to access the info easily: > > A commit list in github from the last release: >

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Takeshi Yamamuro
Hi, Thanks for your notification, Dongjoon! I put some links for the other committers/PMCs to access the info easily: A commit list in github from the last release: https://github.com/apache/spark/compare/5ac2014e6c118fbeb1fe8e5c8064c4a8ee9d182a...branch-2.4 A issue list in jira:

Re: Release Apache Spark 2.4.4

2019-08-13 Thread DB Tsai
+1 On Tue, Aug 13, 2019 at 4:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today (13th August), there are 112 commits (75 JIRAs) in `branch-24` > since 2.4.3. > > It would be great if we can have Spark 2.4.4. > Shall we start `2.4.4

Release Apache Spark 2.4.4

2019-08-13 Thread Dongjoon Hyun
Hi, All. Spark 2.4.3 was released three months ago (8th May). As of today (13th August), there are 112 commits (75 JIRAs) in `branch-24` since 2.4.3. It would be great if we can have Spark 2.4.4. Shall we start `2.4.4 RC1` next Monday (19th August)? Last time, there was a request for K8s issue

Re: Custom aggregations: modular and lightweight solutions?

2019-08-13 Thread Andrew Leverentz
Here's a simpler example that I think gets at the heart of what I'm trying to do: DynamicSchemaExample.scala . Here, I'm dynamically creating a sequence of Rows and also dynamically creating a corresponding schema (StructType), but

Spark Streaming concurrent calls

2019-08-13 Thread Amit Sharma
I am using kafka spark streming. My UI application send request to streaming through kafka. Problem is streaming handles one request at a time so if multiple users send request at the same time they have to wait till earlier request are done. Is there any way it can handle multiple request.

help understanding physical plan

2019-08-13 Thread Marcelo Valle
Hi, I have a job running on AWS EMR. It's basically a join between 2 tables (parquet files on s3), one somehow large (around 50 gb) and other small (less than 1gb). The small table is the result of other operations, but it was a dataframe with `.persist(StorageLevel.MEMORY_AND_DISK_SER)` and the

Re: Sreate temp table or view by using location

2019-08-13 Thread 霍战锋
Sorry for the typo. The title is 'Create temp table or view by using location'. Best Regards 霍战锋 于2019年8月13日周二 下午8:00写道: > Hi, > > I'm trying to use spark SQL to define a temp table which can be > destroyed automatically with the session. But when I using the SQL as > below, I can't query any

Sreate temp table or view by using location

2019-08-13 Thread 霍战锋
Hi, I'm trying to use spark SQL to define a temp table which can be destroyed automatically with the session. But when I using the SQL as below, I can't query any valid row, meanwhile, it works when I delete the word 'temp'. Is there anyone can tell me how to write the right SQL? It doesn't

how to specify which partition each record send on spark structured streaming kafka sink?

2019-08-13 Thread zenglong chen
Key option is not work!

Re: Spark streaming dataframe extract message to new columns

2019-08-13 Thread Tianlang
Hi, Do you mean haven a colum A then you want to extract A1 and A2 from A ? like Cloumn A value   123456,2019-08-07 A1 value is 123456 A2 value is 2019-08-07 If that's the case you can use df.select like this df.select(split('A)(0) as "A1", split('A)(1) as "A2") Good Luck 在 2019/8/12