[jira] [Commented] (FLINK-20578) Cannot create empty array using ARRAY[]

2024-03-17 Thread Nathan Taylor Armstrong Lewis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827826#comment-17827826
 ] 

Nathan Taylor Armstrong Lewis commented on FLINK-20578:
---

Does anyone know of a workaround to create an empty array literal until this 
issue is addressed?

> Cannot create empty array using ARRAY[]
> ---
>
> Key: FLINK-20578
> URL: https://issues.apache.org/jira/browse/FLINK-20578
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.11.2
>Reporter: Fabian Hueske
>Assignee: Eric Xiao
>Priority: Major
>  Labels: pull-request-available, stale-assigned, starter
> Fix For: 1.20.0
>
> Attachments: Screen Shot 2022-10-25 at 10.50.42 PM.png, Screen Shot 
> 2022-10-25 at 10.50.47 PM.png, Screen Shot 2022-10-25 at 11.01.06 PM.png, 
> Screen Shot 2022-10-26 at 2.28.49 PM.png, image-2022-10-26-14-42-08-468.png, 
> image-2022-10-26-14-42-57-579.png
>
>
> Calling the ARRAY function without an element (`ARRAY[]`) results in an error 
> message.
> Is that the expected behavior?
> How can users create empty arrays?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33817) Allow ReadDefaultValues = False for non primitive types on Proto3

2024-02-21 Thread Nathan Taylor Armstrong Lewis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819243#comment-17819243
 ] 

Nathan Taylor Armstrong Lewis commented on FLINK-33817:
---

[~libenchao], yes that should work. We are currently using a fork with this fix 
cherry picked in, so we can stay on that until the latest development version 
goes stable. (y)

> Allow ReadDefaultValues = False for non primitive types on Proto3
> -
>
> Key: FLINK-33817
> URL: https://issues.apache.org/jira/browse/FLINK-33817
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.18.0
>Reporter: Sai Sharath Dandi
>Priority: Major
>  Labels: pull-request-available
>
> *Background*
>  
> The current Protobuf format 
> [implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java]
>  always sets ReadDefaultValues=False when using Proto3 version. This can 
> cause severe performance degradation for large Protobuf schemas with OneOf 
> fields as the entire generated code needs to be executed during 
> deserialization even when certain fields are not present in the data to be 
> deserialized and all the subsequent nested Fields can be skipped. Proto3 
> supports hasXXX() methods for checking field presence for non primitive types 
> since Proto version 
> [3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In 
> the internal performance benchmarks in our company, we've seen almost 10x 
> difference in performance for one of our real production usecase when 
> allowing to set ReadDefaultValues=False with proto3 version. The exact 
> difference in performance depends on the schema complexity and data payload 
> but we should allow user to set readDefaultValue=False in general.
>  
> *Solution*
>  
> Support using ReadDefaultValues=False when using Proto3 version. We need to 
> be careful to check for field presence only on non-primitive types if 
> ReadDefaultValues is false and version used is Proto3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33817) Allow ReadDefaultValues = False for non primitive types on Proto3

2024-02-13 Thread Nathan Taylor Armstrong Lewis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817008#comment-17817008
 ] 

Nathan Taylor Armstrong Lewis edited comment on FLINK-33817 at 2/13/24 1:56 PM:


I can confirm that this issue affects Flink version 1.17.x as well.


was (Author: JIRAUSER304121):
I can confirm that this issue affects version 1.17.x as well.

> Allow ReadDefaultValues = False for non primitive types on Proto3
> -
>
> Key: FLINK-33817
> URL: https://issues.apache.org/jira/browse/FLINK-33817
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.18.0
>Reporter: Sai Sharath Dandi
>Priority: Major
>  Labels: pull-request-available
>
> *Background*
>  
> The current Protobuf format 
> [implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java]
>  always sets ReadDefaultValues=False when using Proto3 version. This can 
> cause severe performance degradation for large Protobuf schemas with OneOf 
> fields as the entire generated code needs to be executed during 
> deserialization even when certain fields are not present in the data to be 
> deserialized and all the subsequent nested Fields can be skipped. Proto3 
> supports hasXXX() methods for checking field presence for non primitive types 
> since Proto version 
> [3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In 
> the internal performance benchmarks in our company, we've seen almost 10x 
> difference in performance for one of our real production usecase when 
> allowing to set ReadDefaultValues=False with proto3 version. The exact 
> difference in performance depends on the schema complexity and data payload 
> but we should allow user to set readDefaultValue=False in general.
>  
> *Solution*
>  
> Support using ReadDefaultValues=False when using Proto3 version. We need to 
> be careful to check for field presence only on non-primitive types if 
> ReadDefaultValues is false and version used is Proto3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33817) Allow ReadDefaultValues = False for non primitive types on Proto3

2024-02-13 Thread Nathan Taylor Armstrong Lewis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817008#comment-17817008
 ] 

Nathan Taylor Armstrong Lewis commented on FLINK-33817:
---

I can confirm that this issue affects version 1.17.x as well.

> Allow ReadDefaultValues = False for non primitive types on Proto3
> -
>
> Key: FLINK-33817
> URL: https://issues.apache.org/jira/browse/FLINK-33817
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.18.0
>Reporter: Sai Sharath Dandi
>Priority: Major
>  Labels: pull-request-available
>
> *Background*
>  
> The current Protobuf format 
> [implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java]
>  always sets ReadDefaultValues=False when using Proto3 version. This can 
> cause severe performance degradation for large Protobuf schemas with OneOf 
> fields as the entire generated code needs to be executed during 
> deserialization even when certain fields are not present in the data to be 
> deserialized and all the subsequent nested Fields can be skipped. Proto3 
> supports hasXXX() methods for checking field presence for non primitive types 
> since Proto version 
> [3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In 
> the internal performance benchmarks in our company, we've seen almost 10x 
> difference in performance for one of our real production usecase when 
> allowing to set ReadDefaultValues=False with proto3 version. The exact 
> difference in performance depends on the schema complexity and data payload 
> but we should allow user to set readDefaultValue=False in general.
>  
> *Solution*
>  
> Support using ReadDefaultValues=False when using Proto3 version. We need to 
> be careful to check for field presence only on non-primitive types if 
> ReadDefaultValues is false and version used is Proto3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28747) "target_id can not be missing" in HTTP statefun request

2024-02-09 Thread Nathan Taylor Armstrong Lewis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816124#comment-17816124
 ] 

Nathan Taylor Armstrong Lewis commented on FLINK-28747:
---

In Protobuf 3, there is an `optional` label. An unset field could then be 
distinguished from a field that was set to the default value.

Would adding {{optional}} to 
https://github.com/apache/flink-statefun/blob/accd75ea0109845c4b4c0ddd74021147af1439d4/statefun-sdk-protos/src/main/protobuf/io/kafka-egress.proto#L28
 be enough to provide the SDKs with a way to distinguish between a valid empty 
string key vs. an invalid unset key? I'm guessing there would have to be other 
changes elsewhere since that file is for the egress and I don't see any 
equivalent protobuf file for kafka ingress messages.

> "target_id can not be missing" in HTTP statefun request
> ---
>
> Key: FLINK-28747
> URL: https://issues.apache.org/jira/browse/FLINK-28747
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.0.0, statefun-3.2.0, statefun-3.1.1
>Reporter: Stephan Weinwurm
>Priority: Major
>
> Hi all,
> We've suddenly started to see the following exception in our HTTP statefun 
> functions endpoints:
> {code}Traceback (most recent call last):
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", 
> line 403, in run_asgi
> result = await app(self.scope, self.receive, self.send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", 
> line 78, in __call__
> return await self.app(scope, receive, send)
>   File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 
> 37, in __call__
> await span_processor.execute()
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 61, in execute
> raise e
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 57, in execute
> await self.app(self.scope, self.receive, self.send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", 
> line 124, in __call__
> await self.middleware_stack(scope, receive, send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 184, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 162, in __call__
> await self.app(scope, receive, _send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 75, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 64, in __call__
> await self.app(scope, receive, sender)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 680, in __call__
> await route.handle(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 275, in handle
> await self.app(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 65, in app
> response = await func(request)
>   File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", 
> line 25, in statefun_handler
> result = await handler.handle_async(request_body)
>   File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", 
> line 262, in handle_async
> msg = Message(target_typename=sdk_address.typename, 
> target_id=sdk_address.id,
>   File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line 
> 42, in __init__
> raise ValueError("target_id can not be missing"){code}
> Interestingly, this has started to happen in three separate Flink deployments 
> at the very same time. The only thing in common between the three deployments 
> is that they consume the same Kafka topics.
> No deployments have happened when the issue started happening which was on 
> July 28th 3:05PM. We have since been continuously seeing the error.
> We were also able to extract the request that Flink sends to the HTTP 
> statefun endpoint:
> {code}{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': 
> 'dummy'}, 'invocations': [{'argument': {'typename': 
> 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': 
> '-redicated-'}}]}}
> {code}
> As you can see, no `id` field is present in the `invocation.target` object or 
> the `target_id` was an empty string.
>  
> This is our module.yaml from one of the Flink deployments:
>  
> {code}
> version: "3.0"
> module:
> meta:
> type: remote
> spec:
> endpoints:
>  - endpoint:
> meta:
> kind: io.statefun.endpoints.v1/http
> spec:
> functions: com.x.dummy/dummy
> urlPathTemplate: 

[jira] [Commented] (FLINK-21227) Upgrade Protobof 3.7.0 for (power)ppc64le support

2024-02-09 Thread Nathan Taylor Armstrong Lewis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816116#comment-17816116
 ] 

Nathan Taylor Armstrong Lewis commented on FLINK-21227:
---

Is there any particular reason not to use the {{$\{protoc.version\}}} property 
that is defined in 
[https://github.com/apache/flink/blob/d2abd744621c6f0f65e7154a2c1b53bcaf78e90b/pom.xml#L161]
 to keep the version of protoc used consistent across the repo?

Specifically that might look something like changing 
[https://github.com/bivasda1/flink/blob/0d5ea7bccf8847b3fdc2049c381764b08dc895e9/flink-formats/flink-parquet/pom.xml#L253]
 to:
{code:java}
com.google.protobuf:protoc:${protoc.version}:exe:${os.detected.classifier}
{code}

I'm not familiar with parquet, so this might be a horrible idea due to 
something parquet specific.

> Upgrade Protobof 3.7.0 for (power)ppc64le support
> -
>
> Key: FLINK-21227
> URL: https://issues.apache.org/jira/browse/FLINK-21227
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Bivas
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> com.google.protobuf:*protoc:3.5.1:exe* was not supported by power. Later 
> versions released multi-arch support including power(ppc64le).Using 
> *protoc:3.7.0:exe* able to build and E2E tests passed successfully.
> https://github.com/bivasda1/flink/blob/master/flink-formats/flink-parquet/pom.xml#L253



--
This message was sent by Atlassian Jira
(v8.20.10#820010)