[jira] [Commented] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2022-01-03 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468066#comment-17468066
 ] 

Xue Wang commented on FLINK-25021:
--

[~fpaul] if things LGTY, then [Azure Data Explorer Kafka Connect Kusto Sink 
Connector|https://github.com/Azure/kafka-sink-azure-kusto] provides a good 
starting point or reference from my perspective. As I mentioned previously, I'm 
thinking of wrapping underlying synchronous ingestion with CompletableFuture as 
a temporary solution until ADX provides a real async API and implementation (or 
maybe I'll help them do so...).

Looking forward to your reply :)

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2022-01-03 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468056#comment-17468056
 ] 

Xue Wang commented on FLINK-25021:
--

Hi [~fpaul], happy new year!

One potential scenario would be to use Flink as a streaming ETL solution and 
then ingest results into ADX for final time-series or user behavior analytics, 
supporting both interactive queries and data visualizations at low latency. 

Some common scenarios as depicted in [ADX 
solutions|https://docs.microsoft.com/en-us/azure/data-explorer/solution-architectures]
 often suggest using ADX to directly consume events from message queues such as 
Kafka or Azure EventHub. However in actual development this is usually a 
sub-optimal solution due to various reasons.

Furthermore, you may also find some docs describing other scenarios like [Azure 
Data Explorer and Stream Analytics for anomaly 
detection|https://azure.microsoft.com/en-us/blog/azure-data-explorer-and-stream-analytics-for-anomaly-detection/]
 and [When to use Azure Data Explorer or/and Azure Stream 
Analytics|https://docs.microsoft.com/en-us/azure/stream-analytics/azure-database-explorer-output#when-to-use-azure-data-explorer-orand-azure-stream-analytics]
 (Azure Stream Analytics is an Azure native offering for streaming processing).

I personally often treat ADX as an close alternative to ElasticSearch, since 
feature-wise they are close enough (though by design they are essentially 
different in many ways). I hope this can also help you understand why I'd like 
to use Flink with ADX. I believe AWS should have similar offering via Kinesis, 
however I cannot comment on this since I'm not familiar with AWS .

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-12-16 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461169#comment-17461169
 ] 

Xue Wang edited comment on FLINK-25021 at 12/17/21, 3:22 AM:
-

Hi [~MartijnVisser], I checked the Kusto Java client in detail today and found 
they haven't implemented an Async client as in C#. What if I wrap the 
synchronous APIs with CompletableFuture like 
[this|https://github.com/Azure/azure-kusto-java/blob/b3d73b8bb1334c2565e551b7d1d2853d14f3fd13/samples/src/main/java/FileIngestionCompletableFuture.java#L77]
 and provide a customized executor? Any concerns or suggestions? Or should I 
use FLIP-143 instead?


was (Author: xwang51):
Hi [~MartijnVisser], I checked the Kusto Java client in detail today and found 
they haven't implemented an Async client as in C#. What if I wrap the 
synchronous APIs with CompletableFuture like 
[this|https://github.com/Azure/azure-kusto-java/blob/b3d73b8bb1334c2565e551b7d1d2853d14f3fd13/samples/src/main/java/FileIngestionCompletableFuture.java#L77]?
 Any concerns?

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-12-16 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461169#comment-17461169
 ] 

Xue Wang commented on FLINK-25021:
--

Hi [~MartijnVisser], I checked the Kusto Java client in detail today and found 
they haven't implemented an Async client as in C#. What if I wrap the 
synchronous APIs with CompletableFuture like 
[this|https://github.com/Azure/azure-kusto-java/blob/b3d73b8bb1334c2565e551b7d1d2853d14f3fd13/samples/src/main/java/FileIngestionCompletableFuture.java#L77]?
 Any concerns?

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-12-07 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454912#comment-17454912
 ] 

Xue Wang commented on FLINK-25021:
--

[~MartijnVisser] No problem, Flink has a very active ML which should keep you 
busy. Really appreciate your guide here. I'll follow your suggestions and work 
on the async sink. Will let you know in this thread if I get more questions. 
Thanks!

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-12-03 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453114#comment-17453114
 ] 

Xue Wang edited comment on FLINK-25021 at 12/3/21, 4:08 PM:


Hi [~MartijnVisser] , I took some time to go through these FLIPs you mentioned 
and some recordings of FF21. To start with, I'd like to implement a sink for 
ADX using [azure-kusto-java|https://github.com/Azure/azure-kusto-java]. The SDK 
supports both sync and async ingest APIs. Currently, I'm planning to use async 
ingestion APIs. 

Here are some questions I have so far:
 # For async sink, what's the preferred API? SinkWriter 
([FLIP-177|https://cwiki.apache.org/confluence/display/FLINK/FLIP-177%3A+Extend+Sink+API])
 or AsyncSinkWriter 
([FLIP-171|https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink])?
 I see the AsyncSyncWriter is still marked as @PublicEvolving.
 # For SyncWriter, I'd like to reference Kafka and ElasticSearch 
implementations. For AsyncSinkWriter, I can only see an example in the FLIP. 
Would like to hear more from you on which implementations should I reference.
 # I learnt from FF21 that you're planning to have separate projects for 
connectors. So should I start this work in my own fork of Flink or should I 
start a new project instead?


was (Author: xwang51):
Hi [~MartijnVisser] , I took some time to go through these FLIPs you mentioned 
and some recordings of FF21. To start with, I'd like to implement a sink for 
ADX using [azure-kusto-java|[https://github.com/Azure/azure-kusto-java]]. The 
SDK supports both sync and async ingest APIs. Currently, I'm planning to use 
async ingestion APIs. 

Here are some questions I have so far:
 # For async sink, what's the preferred API? SinkWriter 
([FLIP-177|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-177%3A+Extend+Sink+API]])
 or AsyncSinkWriter 
([FLIP-171|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]])?
 I see the AsyncSyncWriter is still marked as @PublicEvolving.
 # For SyncWriter, I'd like to reference Kafka and ElasticSearch 
implementations. For AsyncSinkWriter, I can only see an example in the FLIP. 
Would like to hear more from you on which implementations should I reference.
 # I learnt from FF21 that you're planning to have separate projects for 
connectors. So should I start this work in my own fork of Flink or should I 
start a new project instead?

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-12-03 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453114#comment-17453114
 ] 

Xue Wang edited comment on FLINK-25021 at 12/3/21, 4:06 PM:


Hi [~MartijnVisser] , I took some time to go through these FLIPs you mentioned 
and some recordings of FF21. To start with, I'd like to implement a sink for 
ADX using [azure-kusto-java|[https://github.com/Azure/azure-kusto-java]]. The 
SDK supports both sync and async ingest APIs. Currently, I'm planning to use 
async ingestion APIs. 

Here are some questions I have so far:
 # For async sink, what's the preferred API? SinkWriter 
([FLIP-177|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-177%3A+Extend+Sink+API]])
 or AsyncSinkWriter 
([FLIP-171|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]])?
 I see the AsyncSyncWriter is still marked as @PublicEvolving.
 # For SyncWriter, I'd like to reference Kafka and ElasticSearch 
implementations. For AsyncSinkWriter, I can only see an example in the FLIP. 
Would like to hear more from you on which implementations should I reference.
 # I learnt from FF21 that you're planning to have separate projects for 
connectors. So should I start this work in my own fork of Flink or should I 
start a new project instead?


was (Author: xwang51):
Hi [~MartijnVisser] , I took some time to go through these FLIPs you mentioned 
and some recordings of FF21. To start with, I'd like to implement a sink for 
ADX using [azure-kusto-java|[https://github.com/Azure/azure-kusto-java].] The 
SDK supports both sync and async ingest APIs. Currently, I'm planning to use 
async ingestion APIs. 

Here are some questions I have so far:
 # For async sink, what's the preferred API? SinkWriter 
([FLIP-177|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-177%3A+Extend+Sink+API])]
 or AsyncSinkWriter 
([FLIP-171|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink])]
 ? I see the AsyncSyncWriter is still marked as @PublicEvolving.
 # For SyncWriter, I'd like to reference Kafka and ElasticSearch 
implementations. For AsyncSinkWriter, I can only see an example in the FLIP. 
Would like to hear more from you on which implementations should I reference.
 # I learnt from FF21 that you're planning to have separate projects for 
connectors. So should I start this work in my own fork of Flink or should I 
start a new project instead?

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-12-03 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453114#comment-17453114
 ] 

Xue Wang commented on FLINK-25021:
--

Hi [~MartijnVisser] , I took some time to go through these FLIPs you mentioned 
and some recordings of FF21. To start with, I'd like to implement a sink for 
ADX using [azure-kusto-java|[https://github.com/Azure/azure-kusto-java].] The 
SDK supports both sync and async ingest APIs. Currently, I'm planning to use 
async ingestion APIs. 

Here are some questions I have so far:
 # For async sink, what's the preferred API? SinkWriter 
([FLIP-177|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-177%3A+Extend+Sink+API])]
 or AsyncSinkWriter 
([FLIP-171|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink])]
 ? I see the AsyncSyncWriter is still marked as @PublicEvolving.
 # For SyncWriter, I'd like to reference Kafka and ElasticSearch 
implementations. For AsyncSinkWriter, I can only see an example in the FLIP. 
Would like to hear more from you on which implementations should I reference.
 # I learnt from FF21 that you're planning to have separate projects for 
connectors. So should I start this work in my own fork of Flink or should I 
start a new project instead?

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Assignee: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-11-23 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448317#comment-17448317
 ] 

Xue Wang edited comment on FLINK-25021 at 11/24/21, 1:22 AM:
-

Thank you [~MartijnVisser] for the guidance. Glad to see these opportunities to 
contribute to the community and expand its ecosystem. If this looks good to 
you, shall I take on this issue? I'd love to also take the issues you mentioned 
once I finish this one.

Also, before I start, is there any guidance or best practice I should be aware 
of regarding implementing a Flink connector (I committed only one bug fix 
before this :))?


was (Author: xwang51):
Thank you [~MartijnVisser] for the guidance. Glad to see these opportunities to 
contribute to the community and expand its ecosystem. If this looks good to 
you, shall I take on this issue? I'd love to also take the issues you mentioned 
once I finish this one.

Also, before I start, is there any guidance or best practice I should be aware 
of (I committed only one bug fix before this :))?

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-11-23 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448317#comment-17448317
 ] 

Xue Wang commented on FLINK-25021:
--

Thank you [~MartijnVisser] for the guidance. Glad to see these opportunities to 
contribute to the community and expand its ecosystem. If this looks good to 
you, shall I take on this issue? I'd love to also take the issues you mentioned 
once I finish this one.

Also, before I start, is there any guidance or best practice I should be aware 
of (I committed only one bug fix before this :))?

> Source/Sink for Azure Data Explorer (ADX)
> -
>
> Key: FLINK-25021
> URL: https://issues.apache.org/jira/browse/FLINK-25021
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Xue Wang
>Priority: Major
>
> Hi all,
> I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
> first I'd like to check with the community if this is already a supported 
> scenario. If not, will you consider adding it to the list of official 
> connectors?
> FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
> suited for ad-hoc and time series analysis over large volume of structured, 
> semi-structured, and unstructured data. And it can integrate with a wide 
> range of data sources and visualization tools. 
> References:
> [General Intro by ADX 
> PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]
> [ADX 
> Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]
> [Ingest data using the Azure Data Explorer Java 
> SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25021) Source/Sink for Azure Data Explorer (ADX)

2021-11-23 Thread Xue Wang (Jira)
Xue Wang created FLINK-25021:


 Summary: Source/Sink for Azure Data Explorer (ADX)
 Key: FLINK-25021
 URL: https://issues.apache.org/jira/browse/FLINK-25021
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Common
Reporter: Xue Wang


Hi all,

I'm considering implementing source/sink for Azure Data Explorer (ADX). But 
first I'd like to check with the community if this is already a supported 
scenario. If not, will you consider adding it to the list of official 
connectors?

FWIW, Azure Data Explorer is a widely used analytic service on Azure well 
suited for ad-hoc and time series analysis over large volume of structured, 
semi-structured, and unstructured data. And it can integrate with a wide range 
of data sources and visualization tools. 

References:

[General Intro by ADX 
PM|https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto]

[ADX 
Documentation|https://docs.microsoft.com/en-us/azure/data-explorer/data-explorer-overview]

[Ingest data using the Azure Data Explorer Java 
SDK|https://docs.microsoft.com/en-us/azure/data-explorer/java-ingest-data]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-20781) UnalignedCheckpointITCase failure caused by NullPointerException

2021-01-01 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257224#comment-17257224
 ] 

Xue Wang commented on FLINK-20781:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11557=logs=34f41360-6c0d-54d3-11a1-0292a2def1d9=2d56e022-1ace-542f-bf1a-b37dd63243f2=9459

> UnalignedCheckpointITCase failure caused by NullPointerException
> 
>
> Key: FLINK-20781
> URL: https://issues.apache.org/jira/browse/FLINK-20781
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Matthias
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.1
>
>
> {{UnalignedCheckpointITCase}} fails due to {{NullPointerException}} (e.g. in 
> [this 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11083=logs=59c257d0-c525-593b-261d-e96a86f1926b=b93980e3-753f-5433-6a19-13747adae66a=8798]):
> {code:java}
> [ERROR] Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 152.186 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> [ERROR] execute[Parallel cogroup, p = 
> 10](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 34.869 s  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
>   at akka.dispatch.OnComplete.internal(Future.scala:264)
>   at akka.dispatch.OnComplete.internal(Future.scala:261)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>   at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>   at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> 

[jira] [Commented] (FLINK-20781) UnalignedCheckpointITCase failure caused by NullPointerException

2021-01-01 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257223#comment-17257223
 ] 

Xue Wang commented on FLINK-20781:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11546=logs=34f41360-6c0d-54d3-11a1-0292a2def1d9=2d56e022-1ace-542f-bf1a-b37dd63243f2=9380

> UnalignedCheckpointITCase failure caused by NullPointerException
> 
>
> Key: FLINK-20781
> URL: https://issues.apache.org/jira/browse/FLINK-20781
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Matthias
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.1
>
>
> {{UnalignedCheckpointITCase}} fails due to {{NullPointerException}} (e.g. in 
> [this 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=11083=logs=59c257d0-c525-593b-261d-e96a86f1926b=b93980e3-753f-5433-6a19-13747adae66a=8798]):
> {code:java}
> [ERROR] Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 152.186 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase
> [ERROR] execute[Parallel cogroup, p = 
> 10](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 34.869 s  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>   at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>   at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
>   at akka.dispatch.OnComplete.internal(Future.scala:264)
>   at akka.dispatch.OnComplete.internal(Future.scala:261)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>   at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
>   at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
>   at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>   at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> 

[jira] [Commented] (FLINK-20321) Get NPE when using AvroDeserializationSchema to deserialize null input

2020-12-31 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257043#comment-17257043
 ] 

Xue Wang commented on FLINK-20321:
--

Thank you, [~jark]. I just submitted the 
[PR|https://github.com/apache/flink/pull/14539].

> Get NPE when using AvroDeserializationSchema to deserialize null input
> --
>
> Key: FLINK-20321
> URL: https://issues.apache.org/jira/browse/FLINK-20321
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Shengkai Fang
>Assignee: Xue Wang
>Priority: Major
>  Labels: pull-request-available, sprint, starter
> Fix For: 1.13.0
>
>
> You can reproduce the bug by adding the code into the 
> {{AvroDeserializationSchemaTest}}.
> The code follows
> {code:java}
> @Test
>   public void testSpecificRecord2() throws Exception {
>   DeserializationSchema deserializer = 
> AvroDeserializationSchema.forSpecific(Address.class);
>   Address deserializedAddress = deserializer.deserialize(null);
>   assertEquals(null, deserializedAddress);
>   }
> {code}
> Exception stack:
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.flink.formats.avro.utils.MutableByteArrayInputStream.setBuffer(MutableByteArrayInputStream.java:43)
>   at 
> org.apache.flink.formats.avro.AvroDeserializationSchema.deserialize(AvroDeserializationSchema.java:131)
>   at 
> org.apache.flink.formats.avro.AvroDeserializationSchemaTest.testSpecificRecord2(AvroDeserializationSchemaTest.java:69)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20321) Get NPE when using AvroDeserializationSchema to deserialize null input

2020-12-29 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256240#comment-17256240
 ] 

Xue Wang commented on FLINK-20321:
--

Thank you, [~sampadsaha5]. I've worked with Avro Serdes and Confluent Schema 
Registry during several projects. I'll definitely take this opportunity to 
learn more about Flink and contribute to the community. Feel free to get back 
when you have time, contributions are always welcome. Cheers!

> Get NPE when using AvroDeserializationSchema to deserialize null input
> --
>
> Key: FLINK-20321
> URL: https://issues.apache.org/jira/browse/FLINK-20321
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Shengkai Fang
>Assignee: Sampad Kumar Saha
>Priority: Major
>  Labels: sprint, starter
> Fix For: 1.13.0
>
>
> You can reproduce the bug by adding the code into the 
> {{AvroDeserializationSchemaTest}}.
> The code follows
> {code:java}
> @Test
>   public void testSpecificRecord2() throws Exception {
>   DeserializationSchema deserializer = 
> AvroDeserializationSchema.forSpecific(Address.class);
>   Address deserializedAddress = deserializer.deserialize(null);
>   assertEquals(null, deserializedAddress);
>   }
> {code}
> Exception stack:
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.flink.formats.avro.utils.MutableByteArrayInputStream.setBuffer(MutableByteArrayInputStream.java:43)
>   at 
> org.apache.flink.formats.avro.AvroDeserializationSchema.deserialize(AvroDeserializationSchema.java:131)
>   at 
> org.apache.flink.formats.avro.AvroDeserializationSchemaTest.testSpecificRecord2(AvroDeserializationSchemaTest.java:69)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20321) Get NPE when using AvroDeserializationSchema to deserialize null input

2020-12-29 Thread Xue Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256055#comment-17256055
 ] 

Xue Wang commented on FLINK-20321:
--

Hi [~jark], I just joined the community and would like to contribute on some 
starter issues first to familiarize myself with the code contribution process. 
This issue seems easy and I should be able to raise a PR to fix it in a day or 
two. Since we haven't heard from the last assignee for over a month, do you 
mind reassigning this issue to me?

> Get NPE when using AvroDeserializationSchema to deserialize null input
> --
>
> Key: FLINK-20321
> URL: https://issues.apache.org/jira/browse/FLINK-20321
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Shengkai Fang
>Assignee: Sampad Kumar Saha
>Priority: Major
>  Labels: sprint, starter
> Fix For: 1.13.0
>
>
> You can reproduce the bug by adding the code into the 
> {{AvroDeserializationSchemaTest}}.
> The code follows
> {code:java}
> @Test
>   public void testSpecificRecord2() throws Exception {
>   DeserializationSchema deserializer = 
> AvroDeserializationSchema.forSpecific(Address.class);
>   Address deserializedAddress = deserializer.deserialize(null);
>   assertEquals(null, deserializedAddress);
>   }
> {code}
> Exception stack:
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.flink.formats.avro.utils.MutableByteArrayInputStream.setBuffer(MutableByteArrayInputStream.java:43)
>   at 
> org.apache.flink.formats.avro.AvroDeserializationSchema.deserialize(AvroDeserializationSchema.java:131)
>   at 
> org.apache.flink.formats.avro.AvroDeserializationSchemaTest.testSpecificRecord2(AvroDeserializationSchemaTest.java:69)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)