Re: [ANNOUNCE] New Apache Flink PMC Member - Dong Lin

2023-02-15 Thread Juntao Hu
Congratulations, Dong!

Best regards,
Juntao

Leonard Xu  于2023年2月16日周四 15:16写道:

> Congratulations, Dong!
>
> Best,
> Leonard
>
> > On Feb 16, 2023, at 3:03 PM, Dian Fu  wrote:
> >
> > Congratulations Dong!
> >
> > Regards,
> > Dian
> >
> > On Thu, Feb 16, 2023 at 2:41 PM Junrui Lee  wrote:
> >
> >> Congratulations, Dong!
> >>
> >> Best,
> >> Junrui
> >>
> >> Guowei Ma  于2023年2月16日周四 14:21写道:
> >>
> >>> Hi, everyone
> >>>
> >>>On behalf of the PMC, I'm very happy to announce Dong Lin as a new
> >>> Flink PMC.
> >>>
> >>>Dong is currently the main driver of Flink ML. He reviewed a large
> >>> number of Flink ML related PRs and also participated in many Flink ML
> >>> improvements, such as "FLIP-173","FLIP-174" etc. At the same time, he
> >> made
> >>> a lot of evangelism events contributions for the Flink ML ecosystem.
> >>>In fact, in addition to the Flink machine learning field, Dong has
> >> also
> >>> participated in many other improvements in Flink, such as "FLIP-205",
> >>> "FLIP-266","FLIP-269","FLIP-274" etc.
> >>>Please join me in congratulating Dong Lin for becoming a Flink PMC!
> >>>
> >>> Best,
> >>> Guowei(on behalf of the Flink PMC)
> >>>
> >>
>
>


Re: Re: [ANNOUNCE] New Apache Flink Committer - Yuxia Luo

2023-03-12 Thread Juntao Hu
Congratulations, Yuxia!

Best,
Juntao


Wencong Liu  于2023年3月13日周一 11:33写道:

> Congratulations, Yuxia!
>
> Best,
> Wencong Liu
>
>
> At 2023-03-13 11:20:21, "Qingsheng Ren"  wrote:
> >Congratulations, Yuxia!
> >
> >Best,
> >Qingsheng
> >
> >On Mon, Mar 13, 2023 at 10:27 AM Jark Wu  wrote:
> >
> >> Hi, everyone
> >>
> >> On behalf of the PMC, I'm very happy to announce Yuxia Luo as a new
> Flink
> >> Committer.
> >>
> >> Yuxia has been continuously contributing to the Flink project for almost
> >> two
> >> years, authored and reviewed hundreds of PRs over this time. He is
> >> currently
> >> the core maintainer of the Hive component, where he contributed many
> >> valuable
> >> features, including the Hive dialect with 95% compatibility and small
> file
> >> compaction.
> >> In addition, Yuxia driven FLIP-282 (DELETE & UPDATE API) to better
> >> integrate
> >> Flink with data lakes. He actively participated in dev discussions and
> >> answered
> >> many questions on the user mailing list.
> >>
> >> Please join me in congratulating Yuxia Luo for becoming a Flink
> Committer!
> >>
> >> Best,
> >> Jark Wu (on behalf of the Flink PMC)
> >>
>


[jira] [Created] (FLINK-26536) PyFlink RemoteKeyedStateBackend#merge_namespaces bug

2022-03-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-26536:
-

 Summary: PyFlink RemoteKeyedStateBackend#merge_namespaces bug
 Key: FLINK-26536
 URL: https://issues.apache.org/jira/browse/FLINK-26536
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.14.3
Reporter: Juntao Hu
 Fix For: 1.15.0


There's two bugs in RemoteKeyedStateBackend#merge_namespaces:
 * data in source namespaces are not commited before merging
 * target namespace is added at the head of sources_bytes, which causes java 
side SimpleStateRequestHandler to leave one source namespace unmerged



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26763) PyFlink support using system conda when running checks and tests

2022-03-21 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-26763:
-

 Summary: PyFlink support using system conda when running checks 
and tests
 Key: FLINK-26763
 URL: https://issues.apache.org/jira/browse/FLINK-26763
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Juntao Hu


Currently lint-python.sh script uses hardcoded conda path under dev directory, 
and several tools under the "base" env of this conda. It would be great to 
reuse system conda and user-specified env rather than installing a dedicated 
one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26775) PyFlink WindowOperator#process_element register wrong cleanup timer

2022-03-21 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-26775:
-

 Summary: PyFlink WindowOperator#process_element register wrong 
cleanup timer
 Key: FLINK-26775
 URL: https://issues.apache.org/jira/browse/FLINK-26775
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.15.0


In window_operator.py line 378, when dealing with merging window assigner:
{code:python}
self.register_cleanup_timer(window)
{code}
This should be registering a cleanup timer for `actual_window`, but won't 
causing window emitting bugs when session window trigger is implemented 
correctly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27414) Support operator state in PyFlink DataStream API

2022-04-26 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27414:
-

 Summary: Support operator state in PyFlink DataStream API
 Key: FLINK-27414
 URL: https://issues.apache.org/jira/browse/FLINK-27414
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Reporter: Juntao Hu


Currently operator state is not supported in PyFlink, and it's a blocker of 
broadcast stream implementation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27560) Refactor SimpleStateRequestHandler for PyFlink state

2022-05-09 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27560:
-

 Summary: Refactor SimpleStateRequestHandler for PyFlink state
 Key: FLINK-27560
 URL: https://issues.apache.org/jira/browse/FLINK-27560
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Currently SimpleStateRequestHandler.java for handling beam state request from 
python side is coupled with keyed-state logic, which could be refactored to 
reduce code duplication when implementing operator state (list/broadcast state 
can be fit into bag/map logic).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27586) Support non-keyed co-broadcast processing

2022-05-12 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27586:
-

 Summary: Support non-keyed co-broadcast processing
 Key: FLINK-27586
 URL: https://issues.apache.org/jira/browse/FLINK-27586
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Support DataStream.connect(BroadcastStream).process().



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27587) Support keyed co-broadcast processing

2022-05-12 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27587:
-

 Summary: Support keyed co-broadcast processing
 Key: FLINK-27587
 URL: https://issues.apache.org/jira/browse/FLINK-27587
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Support 
KeyedStream.connect(BroadcastStream).process().



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27588) Update broadcast state related docs for PyFlink

2022-05-12 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27588:
-

 Summary: Update broadcast state related docs for PyFlink
 Key: FLINK-27588
 URL: https://issues.apache.org/jira/browse/FLINK-27588
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27657) Implement remote operator state backend in PyFlink

2022-05-16 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27657:
-

 Summary: Implement remote operator state backend in PyFlink
 Key: FLINK-27657
 URL: https://issues.apache.org/jira/browse/FLINK-27657
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


This is for supporting broadcast state, exisintg map state implementation and 
caching handler can be reused.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27676) Output records from on_timer are behind the triggering watermark in PyFlink

2022-05-18 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27676:
-

 Summary: Output records from on_timer are behind the triggering 
watermark in PyFlink
 Key: FLINK-27676
 URL: https://issues.apache.org/jira/browse/FLINK-27676
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Currently, when dealing with watermarks in AbstractPythonFunctionOperator, 
super.processWatermark(mark) is called, which advances watermark in 
timeServiceManager thus triggering timers and then emit current watermark. 
However, timer triggering is not synchronous in PyFlink (processTimer only put 
data into beam buffer), and when remote bundle is closed and output records 
produced by on_timer function finally arrive at Java side, they are already 
behind the triggering watermark.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27733) Rework on_timer output behind watermark bug fix

2022-05-22 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27733:
-

 Summary: Rework on_timer output behind watermark bug fix
 Key: FLINK-27733
 URL: https://issues.apache.org/jira/browse/FLINK-27733
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


FLINK-27676 can be simplified by just checking isBundleFinished() before 
emitting watermark in AbstractPythonFunctionOperator, and this fix FLINK-27676 
in python group window aggregate too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27901) support TableEnvironment.create(configuration) in PyFlink

2022-06-06 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27901:
-

 Summary: support TableEnvironment.create(configuration) in PyFlink
 Key: FLINK-27901
 URL: https://issues.apache.org/jira/browse/FLINK-27901
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Align TableEnvironment.create(configuration) API with Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27932) align noWatermarks and withWatemarkAlignment API in PyFlink

2022-06-07 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-27932:
-

 Summary: align noWatermarks and withWatemarkAlignment API in 
PyFlink
 Key: FLINK-27932
 URL: https://issues.apache.org/jira/browse/FLINK-27932
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Add no_watermarks and with_watermark_alignment to align with Java API.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28043) "Invalid lambda deserialization" in AvroParquetReaders

2022-06-14 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28043:
-

 Summary: "Invalid lambda deserialization" in AvroParquetReaders
 Key: FLINK-28043
 URL: https://issues.apache.org/jira/browse/FLINK-28043
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


I packed a bundle jar including flink-parquet and flink-avro with 
"org.apache.avro" relocated, to support PyFlink reading avro records from 
parquet file, and "Invalid lambda deserialization" error occurs at runtime. I 
guess this is similar to FLINK-18006 and points to MSHADE-260

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28182) Support Avro generic record decoder in PyFlink

2022-06-21 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28182:
-

 Summary: Support Avro generic record decoder in PyFlink
 Key: FLINK-28182
 URL: https://issues.apache.org/jira/browse/FLINK-28182
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Avro generic record decoder is useful for format like parquet-avro, which 
enables PyFlink users read parquet files into python native objects within a 
given avro schema.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28323) Support using new KafkaSource in PyFlink

2022-06-30 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28323:
-

 Summary: Support using new KafkaSource in PyFlink
 Key: FLINK-28323
 URL: https://issues.apache.org/jira/browse/FLINK-28323
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


KafkaSource implements new FileSource API, which should also be introduced to 
Python API, thus some other API e.g. HybridSource can use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28336) Support parquet-avro format in PyFlink DataStream

2022-06-30 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28336:
-

 Summary: Support parquet-avro format in PyFlink DataStream
 Key: FLINK-28336
 URL: https://issues.apache.org/jira/browse/FLINK-28336
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0


Parquet-avro has three interfaces, ReflectData, SpecificData and GenericData, 
considered that the first two interface need cooresponding Java class, we just 
support GenericData in PyFlink, where users use strings to define Avro schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28340) Support using system conda env for PyFlink tests

2022-06-30 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28340:
-

 Summary: Support using system conda env for PyFlink tests
 Key: FLINK-28340
 URL: https://issues.apache.org/jira/browse/FLINK-28340
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28393) Support AvroInputFormat in PyFlink

2022-07-05 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28393:
-

 Summary: Support AvroInputFormat in PyFlink
 Key: FLINK-28393
 URL: https://issues.apache.org/jira/browse/FLINK-28393
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28413) Support ParquetColumnarRowInputFormat in PyFlink

2022-07-05 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28413:
-

 Summary: Support ParquetColumnarRowInputFormat in PyFlink
 Key: FLINK-28413
 URL: https://issues.apache.org/jira/browse/FLINK-28413
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28464) Support CsvReaderFormat in PyFlink

2022-07-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28464:
-

 Summary: Support CsvReaderFormat in PyFlink
 Key: FLINK-28464
 URL: https://issues.apache.org/jira/browse/FLINK-28464
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28510) Support using new KafkaSink in PyFlink

2022-07-12 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28510:
-

 Summary: Support using new KafkaSink in PyFlink
 Key: FLINK-28510
 URL: https://issues.apache.org/jira/browse/FLINK-28510
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28664) Support AvroWriters in PyFlink

2022-07-24 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28664:
-

 Summary: Support AvroWriters in PyFlink
 Key: FLINK-28664
 URL: https://issues.apache.org/jira/browse/FLINK-28664
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28740) Support CsvBulkWriter in PyFlink

2022-07-29 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28740:
-

 Summary: Support CsvBulkWriter in PyFlink
 Key: FLINK-28740
 URL: https://issues.apache.org/jira/browse/FLINK-28740
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0


Java users could just use a lambda to construct a BulkWriterFactory for 
CsvBulkWriter, but for Python users, they still need a direct-to-use method to 
create a factory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28762) Support AvroParquetWriters in PyFlink

2022-08-01 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28762:
-

 Summary: Support AvroParquetWriters in PyFlink
 Key: FLINK-28762
 URL: https://issues.apache.org/jira/browse/FLINK-28762
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28782) Support FileSink compaction in PyFlink

2022-08-02 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28782:
-

 Summary: Support FileSink compaction in PyFlink
 Key: FLINK-28782
 URL: https://issues.apache.org/jira/browse/FLINK-28782
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28827) Complete DataType support in DataStream API for PyFlink

2022-08-05 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28827:
-

 Summary: Complete DataType support in DataStream API for PyFlink
 Key: FLINK-28827
 URL: https://issues.apache.org/jira/browse/FLINK-28827
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0


DataTypes are used for some FileSource formats originated from Table API, e.g. 
ParquetColumnarRowInput, but some types doesn't work in current PyFlink 
implementations, e.g. TIMESTAMP_LTZ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28862) Support writing RowData to Parquet files in PyFlink

2022-08-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28862:
-

 Summary: Support writing RowData to Parquet files in PyFlink
 Key: FLINK-28862
 URL: https://issues.apache.org/jira/browse/FLINK-28862
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28876) Support writing RowData to Orc files in PyFlink

2022-08-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28876:
-

 Summary: Support writing RowData to Orc files in PyFlink
 Key: FLINK-28876
 URL: https://issues.apache.org/jira/browse/FLINK-28876
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28880) Fix CEP doc with wrong result of strict contiguity of looping patterns

2022-08-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28880:
-

 Summary: Fix CEP doc with wrong result of strict contiguity of 
looping patterns
 Key: FLINK-28880
 URL: https://issues.apache.org/jira/browse/FLINK-28880
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0


https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/libs/cep/#contiguity-within-looping-patterns
The result of strict contiguity should be {a b1 c}, {a b2 c}, {a b3 c}, since b 
is *followed by* c.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28895) Eliminate the map to convert RowData to Row before RowData FileSink

2022-08-09 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28895:
-

 Summary: Eliminate the map to convert RowData to Row before 
RowData FileSink
 Key: FLINK-28895
 URL: https://issues.apache.org/jira/browse/FLINK-28895
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28904) Add missing connector/format doc for PyFlink

2022-08-10 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28904:
-

 Summary: Add missing connector/format doc for PyFlink
 Key: FLINK-28904
 URL: https://issues.apache.org/jira/browse/FLINK-28904
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.15.1
Reporter: Juntao Hu
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28908) Coder for LIST type is incorrectly chosen is PyFlink

2022-08-10 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-28908:
-

 Summary: Coder for LIST type is incorrectly chosen is PyFlink
 Key: FLINK-28908
 URL: https://issues.apache.org/jira/browse/FLINK-28908
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.15.1, 1.14.5
Reporter: Juntao Hu
 Fix For: 1.16.0, 1.15.2, 1.14.6


Code to reproduce this bug, the result is `[None, None, None]`:
{code:python}
jvm = get_gateway().jvm
env = StreamExecutionEnvironment.get_execution_environment()
j_item = jvm.java.util.ArrayList()
j_item.add(1)
j_item.add(2)
j_item.add(3)
j_list = jvm.java.util.ArrayList()
j_list.add(j_item)
type_info = Types.LIST(Types.INT())
ds = DataStream(env._j_stream_execution_environment.fromCollection(j_list, 
type_info.get_java_type_info()))
ds.map(lambda e: print(e))
env.execute() {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29477) ClassCastException when collect primitive array to Python

2022-09-29 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-29477:
-

 Summary: ClassCastException when collect primitive array to Python
 Key: FLINK-29477
 URL: https://issues.apache.org/jira/browse/FLINK-29477
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.15.2, 1.16.0
Reporter: Juntao Hu
 Fix For: 1.17.0, 1.15.3, 1.16.1


How to reproduce this bug:
{code:java}
ds = env.from_collection([1, 2], type_info=Types.PRIMITIVE_ARRAY(Types.INT()))
ds.execute_and_collect(){code}
got:
{code:java}
java.lang.ClassCastException: class [I cannot be cast to class 
[Ljava.lang.Object {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29648) "LocalDateTime not supported" error when retrieving Java TypeInformation from PyFlink

2022-10-15 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-29648:
-

 Summary: "LocalDateTime not supported" error when retrieving Java 
TypeInformation from PyFlink
 Key: FLINK-29648
 URL: https://issues.apache.org/jira/browse/FLINK-29648
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.16.0
Reporter: Juntao Hu
 Fix For: 1.16.0


The following code raises "TypeError: The java type info: LocalDateTime is not 
supported in PyFlink currently.":
{code:java}
t_env.to_data_stream(t).key_by(...){code}
However, this works:
{code:java}
t_env.to_data_stream(t).map(lambda r: r).key_by(...){code}
Although we add Python coders for LocalTimeTypeInfo in 1.16, there's no 
corresponding typeinfo at Python side. So it works when a user immediately does 
processing after to_data_stream since date/time data has already been converted 
to Python object, but when key_by tries to retrieve typeinfo from Java 
TypeInformation, it fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29658) LocalTimeTypeInfo support in DataStream API in PyFlink

2022-10-17 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-29658:
-

 Summary: LocalTimeTypeInfo support in DataStream API in PyFlink
 Key: FLINK-29658
 URL: https://issues.apache.org/jira/browse/FLINK-29658
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.15.2
Reporter: Juntao Hu
 Fix For: 1.15.3


LocalTimeTypeInfo is needed when calling `to_data_stream` on tables containing 
DATE/TIME/TIMESTAMP fields in PyFlink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29681) Python side-output operator not generated in some cases

2022-10-18 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-29681:
-

 Summary: Python side-output operator not generated in some cases
 Key: FLINK-29681
 URL: https://issues.apache.org/jira/browse/FLINK-29681
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.16.0, 1.17.0
Reporter: Juntao Hu
 Fix For: 1.16.0, 1.17.0


If a SideOutputDataStream is used in `execute_and_collect`, `from_data_stream`, 
`create_temporary_view`, the side-outputing operator will not be properly 
configured, since we rely on the bottom-up scan of transformations while 
there's no solid transformation after the SideOutputTransformation in these 
cases. Thus, an error, "tuple object has no attribute get_fields_by_names", 
will be raised.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30069) Expected prune behavior for matches with same priority

2022-11-17 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30069:
-

 Summary: Expected prune behavior for matches with same priority
 Key: FLINK-30069
 URL: https://issues.apache.org/jira/browse/FLINK-30069
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Reporter: Juntao Hu


When a pattern produces several matches with same priority, is it the expected 
behavior to keep only the first dequeued one?

E.G.

pattern: A.followedByAny(B).followedBy(C)

seq: a1, b1, b2, b3, c1

aftermatch strategy: skip_to_next

Potential matches (a1, b1, c1) (a1, b2, c1) (a1, b3, c1) reach final state at 
the same time and have the same priority, but (a1, b1, c1) is the first one in 
the queue, thus (a1, b2, c1) and (a1, b3, c1) will be dropped by skip_to_next 
strategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35302) Flink REST server throws exception on unknown fields in RequestBody

2024-05-06 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-35302:
-

 Summary: Flink REST server throws exception on unknown fields in 
RequestBody
 Key: FLINK-35302
 URL: https://issues.apache.org/jira/browse/FLINK-35302
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Affects Versions: 1.19.0
Reporter: Juntao Hu
 Fix For: 1.19.1


As 
[FLIP-401|https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance]
 and FLINK-33258 mentioned, when an old version REST client receives response 
from a new version REST server, with strict JSON mapper, the client will throw 
exceptions on newly added fields, which is not convenient for situations where 
a centralized client deals with REST servers of different versions (e.g. k8s 
operator).
But this incompatibility can also happens at server side, when a new version 
REST client sends requests to an old version REST server with additional 
fields. Making server flexible with unknown fields can save clients from 
backward compatibility code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30839) Relaxed looping group pattern make the following strict contiguity relaxed too

2023-01-30 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30839:
-

 Summary: Relaxed looping group pattern make the following strict 
contiguity relaxed too
 Key: FLINK-30839
 URL: https://issues.apache.org/jira/browse/FLINK-30839
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1
Reporter: Juntao Hu
 Attachments: looping-group-pattern-relaxed-1.png

If a group pattern loops with relaxed contiguity, the strict contiguity between 
group pattern and next pattern turns out to be as the same effect as relaxed 
contiguity.

E.g. for pattern:
{code:java}
Pattern.begin(Pattern.begin("A").next("B")).oneOrMore().next("C")
{code}
sequence (a b d c) is also a matched one, which is abviously wrong.

As part of the NFA below shows, the ignore edge on A state causes the 
unexpected ignoration of event "d".

!looping-group-pattern-relaxed.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30864) Optional pattern at the start of a group pattern not working

2023-02-01 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30864:
-

 Summary: Optional pattern at the start of a group pattern not 
working
 Key: FLINK-30864
 URL: https://issues.apache.org/jira/browse/FLINK-30864
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1
Reporter: Juntao Hu


The optional pattern at the start of a group pattern turns out be "not 
optional", e.g.
{code:java}
Pattern.begin("A").next(Pattern.begin("B").optional().next("C")).next("D")
{code}
cannot match sequence "a1 c1 d1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30885) Optional group pattern starts with non-optional looping pattern get wrong result on followed-by

2023-02-02 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30885:
-

 Summary: Optional group pattern starts with non-optional looping 
pattern get wrong result on followed-by
 Key: FLINK-30885
 URL: https://issues.apache.org/jira/browse/FLINK-30885
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1
Reporter: Juntao Hu


{code:java}
Pattern.begin("A")
  .followedBy(
Pattern.begin("B").oneOrMore().greedy().consecutive()
  .next("C"))
  .optional()
  .next("D"){code}
This can match "a1 e1 d1", which is not the expected behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30961) Remove deprecated tests_require in setup.py

2023-02-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30961:
-

 Summary: Remove deprecated tests_require in setup.py
 Key: FLINK-30961
 URL: https://issues.apache.org/jira/browse/FLINK-30961
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Juntao Hu
 Fix For: 1.18.0


According to [https://github.com/pypa/setuptools/pull/1878] , `python setup.py 
test` command is deprecated, thus related `tests_require` could be removed from 
setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30962) Improve error messaging when launching py4j gateway server

2023-02-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30962:
-

 Summary: Improve error messaging when launching py4j gateway server
 Key: FLINK-30962
 URL: https://issues.apache.org/jira/browse/FLINK-30962
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Juntao Hu
 Fix For: 1.18.0


In some cases, e.g. JVM_ARGS does not match the java executable version that 
pyflink uses when launching py4j gateway server, there'll be only a general 
message "Exception: Java gateway process exited before sending its port 
number". It would be easier to investigate the error cause if we expose stderr 
from java process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30969) Pyflink table example throws "module 'pandas' has no attribute 'Int8Dtype'"

2023-02-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30969:
-

 Summary: Pyflink table example throws "module 'pandas' has no 
attribute 'Int8Dtype'"
 Key: FLINK-30969
 URL: https://issues.apache.org/jira/browse/FLINK-30969
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.17.0
    Reporter: Juntao Hu
 Fix For: 1.17.0, 1.18.0


After apache-beam is upgraded to 2.43.0 in 1.17, running `python 
pyflink/examples/table/basic_operations.py` will throw error:
{code:java}
Traceback (most recent call last):
  File "pyflink/examples/table/basic_operations.py", line 484, in 
    basic_operations()
  File "pyflink/examples/table/basic_operations.py", line 29, in 
basic_operations
    t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
  File 
"/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
 line 121, in create
    return TableEnvironment(j_tenv)
  File 
"/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
 line 100, in __init__
    self._open()
  File 
"/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
 line 1640, in _open
    startup_loopback_server()
  File 
"/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
 line 1631, in startup_loopback_server
    from pyflink.fn_execution.beam.beam_worker_pool_service import \
  File 
"/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/fn_execution/beam/beam_worker_pool_service.py",
 line 31, in 
    from apache_beam.options.pipeline_options import DebugOptions
  File 
"/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/__init__.py",
 line 92, in 
    from apache_beam import coders
  File 
"/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/__init__.py",
 line 17, in 
    from apache_beam.coders.coders import *
  File 
"/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/coders.py",
 line 59, in 
    from apache_beam.coders import coder_impl
  File "apache_beam/coders/coder_impl.py", line 63, in init 
apache_beam.coders.coder_impl
  File 
"/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/__init__.py",
 line 31, in 
    from apache_beam.typehints.pandas_type_compatibility import *
  File 
"/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/pandas_type_compatibility.py",
 line 81, in 
    (pd.Int8Dtype(), Optional[np.int8]),
AttributeError: module 'pandas' has no attribute 'Int8Dtype' {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30981) explain_sql throws java method not exist

2023-02-08 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-30981:
-

 Summary: explain_sql throws java method not exist
 Key: FLINK-30981
 URL: https://issues.apache.org/jira/browse/FLINK-30981
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.17.0
Reporter: Juntao Hu
 Fix For: 1.17.0, 1.18.0


Execute `t_env.explainSql("ANY VALID SQL")` will throw error:
{code:java}
Traceback (most recent call last):
  File "ISSUE/FLINK-25622.py", line 42, in 
    main()
  File "ISSUE/FLINK-25622.py", line 34, in main
    print(t_env.explain_sql(
  File 
"/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
 line 799, in explain_sql
    return self._j_tenv.explainSql(stmt, j_extra_details)
  File 
"/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/py4j/java_gateway.py",
 line 1322, in __call__
    return_value = get_return_value(
  File 
"/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/util/exceptions.py",
 line 146, in deco
    return f(*a, **kw)
  File 
"/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/py4j/protocol.py",
 line 330, in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling o11.explainSql. Trace:
org.apache.flink.api.python.shaded.py4j.Py4JException: Method explainSql([class 
java.lang.String, class [Lorg.apache.flink.table.api.ExplainDetail;]) does not 
exist
    at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
    at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
    at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:274)
    at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
    at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829) {code}
#30668 changed TableEnvironment#explainSql to an interface default method, 
while both TableEnvironmentInternal and TableEnvironmentImpl not overwriting 
it, it triggers a bug in py4j, see [https://github.com/py4j/py4j/issues/506] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31017) Early-started partial match timeout not yield completed matches

2023-02-10 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-31017:
-

 Summary: Early-started partial match timeout not yield completed 
matches
 Key: FLINK-31017
 URL: https://issues.apache.org/jira/browse/FLINK-31017
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1, 1.15.3, 1.17.0
Reporter: Juntao Hu
 Fix For: 1.18.0


Pattern example:
{code:java}
Pattern.begin("A").where(startsWith("a")).oneOrMore().consecutive().greedy()
.followedBy("B")
.where(count("A") > 2 ? startsWith("b") : startsWith("c"))
.within(Time.seconds(3));{code}
Sequence example, currently without any output:

a1 a2 a3 a4 c1

When match[a3, a4, c1] completes, partial match[a1, a2, a3, a4] is earlier, so 
NFA#processMatchesAccordingToSkipStrategy() won't give any result, which is the 
expected behavior. However, when partial match[a1, a2, a3, a4] is timed-out, 
completed match[a3, a4, c1] should be "freed" from NFAState to output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31040) Looping pattern notFollowedBy at end missing an element

2023-02-13 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-31040:
-

 Summary: Looping pattern notFollowedBy at end missing an element
 Key: FLINK-31040
 URL: https://issues.apache.org/jira/browse/FLINK-31040
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1, 1.15.3, 1.17.0
Reporter: Juntao Hu
 Fix For: 1.17.0, 1.15.4, 1.16.2, 1.18.0


Pattern: 
begin("A").oneOrMore().consecutive().notFollowedBy("B").within(Time.milliseconds(3))

Sequence: , , , ,  will produce 
results [a1, a2], [a3], which obviously should be [a1, a2, a3] and [a3, a4].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31042) AfterMatchSkipStrategy not working on notFollowedBy ended pattern

2023-02-13 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-31042:
-

 Summary: AfterMatchSkipStrategy not working on notFollowedBy ended 
pattern
 Key: FLINK-31042
 URL: https://issues.apache.org/jira/browse/FLINK-31042
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1, 1.15.3, 1.17.0
Reporter: Juntao Hu
 Fix For: 1.17.0, 1.15.4, 1.16.2, 1.18.0


Pattern: begin("A", 
SkipToNext()).oneOrMore().allowCombinations().followedBy("C").notFollowedBy("B").within(Time.milliseconds(10L))

Sequence: will produce

[a1, a2, a3, c1]

[a1, a2, c1]

[a1, c1]

[a2, a3, c1]

[a2, c1]

[a3, c1]

Using SkipPastLastEvent() also produce the same result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31044) Unexpected behavior using greedy on looping pattern with notFollowedBy at end

2023-02-13 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-31044:
-

 Summary: Unexpected behavior using greedy on looping pattern with 
notFollowedBy at end
 Key: FLINK-31044
 URL: https://issues.apache.org/jira/browse/FLINK-31044
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1, 1.15.3, 1.17.0
Reporter: Juntao Hu
 Fix For: 1.17.0, 1.15.4, 1.16.2, 1.18.0


Pattern without greedy: begin("A" , 
SKIP_PAST_LAST).oneOrMore().consecutive().notFollowedBy("B").within(Time.milliseconds(10))

Pattern with greedy: begin("A", 
SKIP_PAST_LAST).oneOrMore().consecutive().greedy().notFollowedBy("B").within(Time.milliseconds(10))

For sequence:   , after FLINK-31040 and FLINK-31042 
fixed, both of the patterns should produce [a1, a2, a3] , when currently the 
pattern with greedy option makes no output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31083) Python ProcessFunction with OutputTag cannot be reused

2023-02-15 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-31083:
-

 Summary: Python ProcessFunction with OutputTag cannot be reused
 Key: FLINK-31083
 URL: https://issues.apache.org/jira/browse/FLINK-31083
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.16.1, 1.17.0
Reporter: Juntao Hu
 Fix For: 1.17.0, 1.16.2, 1.18.0


{code:java}
output_tag = OutputTag("side", Types.STRING())

def udf(i):
yield output_tag, i

ds1.map(udf).get_side_output(output_tag)
ds2.map(udf){code}
raises TypeError: cannot pickle '_thread.RLock' object



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31185) Python BroadcastProcessFunction not support side output

2023-02-22 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-31185:
-

 Summary: Python BroadcastProcessFunction not support side output
 Key: FLINK-31185
 URL: https://issues.apache.org/jira/browse/FLINK-31185
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.16.1
Reporter: Juntao Hu
 Fix For: 1.17.0, 1.16.2






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31328) Greedy option on the looping pattern at the end not working

2023-03-05 Thread Juntao Hu (Jira)
Juntao Hu created FLINK-31328:
-

 Summary: Greedy option on the looping pattern at the end not 
working
 Key: FLINK-31328
 URL: https://issues.apache.org/jira/browse/FLINK-31328
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.16.1, 1.15.3, 1.17.0
Reporter: Juntao Hu


If use greedy option on a looping pattern which is at the end of the whole 
pattern, the matching result is not "greedy".

Example1

pattern: A.oneOrMore().consecutive().greedy() (SKIP_TO_NEXT)

sequence: a1, a2, a3

result: [a1] [a2] [a3]

Example2

pattern: B.next(A).oneOrMore().consecutive().greedy() (SKIP_TO_NEXT)

sequence: b1, a1, a2, a3

result: [b1 a1]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)