[VOTE] FLIP-287: Extend Sink#InitContext to expose TypeSerializer, ObjectReuse and JobID

2023-06-16 Thread Joao Boto
Hi all,

Thank you to everyone for the feedback on FLIP-287[1]. Based on the
discussion thread [2], we have come to a consensus on the design and are
ready to take a vote to contribute this to Flink.

I'd like to start a vote for it. The vote will be open for at least 72
hours(excluding weekends, unless there is an objection or an insufficient
number of votes.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
[2]https://lists.apache.org/thread/wb3myhqsdz81h08ygwx057mkn1hc3s8f


Best,
Joao Boto


[RESULT][VOTE] FLIP-287: Extend Sink#InitContext to expose TypeSerializer, ObjectReuse and JobID

2023-06-21 Thread Joao Boto
Hi all, I am happy to announce that FLIP-287: Extend Sink#InitContext to
expose TypeSerializer, ObjectReuse and JobID[1] has been accepted. There
are 8 approving votes, 6 of which are binding: - Lijie Wang (binding)
- Jing Ge (binding)
- Tzu-Li (Gordon) Tai (binding)
- Zhu Zhu (binding)
- Yuepeng Pan
- Martijn Visser (binding)
- Leonard Xu (binding)
- John Roesler

There are no disapproving votes.

Thanks everyone for participating! [1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
  Best Regards, João Boto


dev@flink.apache.org

2023-10-06 Thread Joao Boto
Hi all, Thank you to everyone for the feedback on FLIP-239[1]. Based on the
discussion thread [2] and some offline discussions, we have come to a
consensus on the design and are ready to take a vote to contribute this to
Flink. I'd like to start a vote for it. The vote will be open for at least
72 hours(excluding weekends, unless there is an objection or an
insufficient number of votes. [1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271
[2]https://lists.apache.org/thread/yx833h5h3fjlyor0bfm32chy3sjw8hwt  Best,
Joao Boto


[DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-04-25 Thread Joao Boto
Hi all,

I'd like to start a discussion on FLIP-449: Reorganization of
flink-connector-jdbc [1].
As Flink continues to evolve, we've noticed an increasing level of
complexity within the JDBC connector.
The proposed solution is to address this complexity by separating the core
functionality from individual database components, thereby streamlining the
structure into distinct modules.

Looking forward to your feedback and suggestions, thanks.
Best regards,
Joao Boto

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc


[VOTE] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-09 Thread Joao Boto
Hi everyone,

Thanks for all the feedback, I'd like to start a vote on the FLIP-449:
Reorganization of flink-connector-jdbc [1].
The discussion thread is here [2].

The vote will be open for at least 72 hours unless there is an objection or
insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
[2] https://lists.apache.org/thread/jc1yvvo35xwqzlxl5mj77qw3hq6f5sgr

Best
Joao Boto


[RESULT][VOTE] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-15 Thread Joao Boto
Hi,

I am happy to say that FLIP-449: Reorganization of flink-connector-jdbc [1]
has been accepted.

The proposal has been accepted with 5 approving votes (1 binding) and there
are no disapproval (voted on this thread [2]):
Rui Fan (binding)
Yuepeng Pan (non-binding)
Aleksandr Pilipenko (non-binding)
Muhammet Orazov (non-binding)
Jeyhun Karimov (non-binding)

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
[2] https://lists.apache.org/thread/mfqx711zoxgs3sbojr2slqrt2xv2h5q9

Thanks to all involved.

Best,
Joao Boto


[RESULT][VOTE] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-22 Thread Joao Boto
Hi,

I am happy to say that FLIP-449: Reorganization of flink-connector-jdbc [1]
has been accepted.

The proposal has been accepted with 7 approving votes (3 binding) and there
are no disapproval (voted on this thread [2]):

Ahmed Hamdy (non-binding)
Aleksandr Pilipenko (non-binding)
Jeyhun Karimov (non-binding)
Jiabao Sun (binding)
Leonard Xu (binding)
Muhammet Orazov (non-binding)
Rui Fan (binding)
Yuepeng Pan (non-binding)

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
[2] https://lists.apache.org/thread/mfqx711zoxgs3sbojr2slqrt2xv2h5q9

Thanks to all involved.

Best,
Joao Boto


Out of space on connector runner (jdbc)

2024-07-10 Thread Joao Boto
Hi,

During the refactor related to FLIP-449[1], we start seeing this error on
CI like:

> - "You are running out of disk space. The runner will stop working when
> the machine runs out of disk space. Free space left: 0 MB"

- "Warning:  ForkStarter IOException: Unable to create file for report: No
> space left on device. See the dump file
> /home/runner/work/flink-connector-jdbc/flink-connector-jdbc/flink-connector-jdbc-oceanbase/target/surefire-reports/2024-07-09T09-59-05_229-jvmRun4.dumpstream"


The runner has 21gb free at start, and the docker images used have 12gb..
(this could grow if we add more databases)
Free space

> Filesystem Size Used Avail Use% Mounted on
> 12
> /dev/root
> 73G 52G 21G 72% /
>

We could consider that the maven could use another 3gb (in jar, class files
etc), but we should have another 5gb of free space..

I create a copy of the workflow to allow add some steps, making some
cleanup in the runner to get more space, and now I can run the tests

> Filesystem Size Used Avail Use% Mounted on
> 12
> /dev/root
> 73G 34G 40G 46% /
>

these are the changes:
https://github.com/eskabetxe/flink-connector-jdbc/blob/FLINK-35363/.github/workflows/clean_space.yml#L47-L54

Could someone help us here?
Should we add to
"apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils" an
option to clean the runner?
Another option could be change docker to store files on "/mnt" that have
66GB free (dont find any way to do this)

> /dev/sdb1 74G 4.1G 66G 6% /mnt
>


this is the last run:
https://github.com/apache/flink-connector-jdbc/actions/runs/9854907626
this is the last run of clean CI:
https://github.com/apache/flink-connector-jdbc/actions/runs/9854907586



Hi Team,

During the refactor related to FLIP-449, we have started encountering the
following errors on CI:

> "You are running out of disk space. The runner will stop working when the
> machine runs out of disk space. Free space left: 0 MB."



> "Warning: ForkStarter IOException: Unable to create file for report: No
> space left on device. See the dump file
> /home/runner/work/flink-connector-jdbc/flink-connector-jdbc/flink-connector-jdbc-oceanbase/target/surefire-reports/2024-07-09T09-59-05_229-jvmRun4.dumpstream."


Initially, the runner has 21GB free space, and the docker images used take
up 12GB. This usage could increase if more databases are added. The free
space details are as follows:

Filesystem  Size  Used  Avail Use%  Mounted on
> /dev/root   73G   52G   21G  72%   /

Considering Maven might use another 3GB (for jars, class files, etc.), we
should still have around 5GB of free space.

I have created a copy of the workflow and added some steps to clean up the
runner to free up more space. With these changes, I am now able to run the
tests successfully. The updated free space details are:

Filesystem  Size  Used  Avail Use%  Mounted on
> /dev/root   73G   34G   40G  46%   /

You can find the changes I made here:
https://github.com/eskabetxe/flink-connector-jdbc/blob/FLINK-35363/.github/workflows/clean_space.yml#L47-L54

Could someone assist us in resolving this issue?

Should we add an option to clean the runner in
apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils?

 Another option might be to configure Docker to store files on /mnt, which
has 66GB free space:

Filesystem  Size  Used  Avail Use%  Mounted on
> /dev/sdb1   74G   4.1G  66G  6%/mnt

However, I haven't found a way to do this.

Here are the details of the last runs:

   - Last run of CI:
   https://github.com/apache/flink-connector-jdbc/actions/runs/9854907626
   - Last run of clean CI:
   https://github.com/apache/flink-connector-jdbc/actions/runs/9854907586

Thank you for your help!

Best regards,

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc


Out of space on connector runner (jdbc)

2024-07-10 Thread Joao Boto
Hi,

During the refactor related to FLIP-449, we have started encountering the
following errors on CI:


   - "You are running out of disk space. The runner will stop working when
   the machine runs out of disk space. Free space left: 0 MB."
   -  "You are running out of disk space. The runner will stop working when
   the machine runs out of disk space. Free space left: 0 MB."


Initially, the runner has 21GB free space, and the docker images used take
up 12GB. This usage could increase if more databases are added. The free
space details are as follows:

Filesystem Size Used Avail Use% Mounted on
/dev/root 73G 52G 21G 72% /

Considering Maven might use another 3GB (for jars, class files, etc.), we
should still have around 5GB of free space.

I have created a copy of the workflow and added some steps to clean up the
runner to free up more space. With these changes, I am now able to run the
tests successfully. The updated free space details are:

Filesystem Size Used Avail Use% Mounted on
/dev/root 73G 34G 40G 46% /


You can find the changes I made here [2]

Could someone assist us in resolving this issue?

Should we add an option to clean the runner in
apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils?

 Another option might be to configure Docker to store files on /mnt, which
has 66GB free space:

Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 74G 4.1G 66G 6% /mnt


However, I haven't found a way to do this.

Here are the details of the last runs:

   - Last run of CI:
   https://github.com/apache/flink-connector-jdbc/actions/runs/9854907626
   - Last run of clean CI:
   https://github.com/apache/flink-connector-jdbc/actions/runs/9854907586

Thank you for your help!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc

[2]
https://github.com/eskabetxe/flink-connector-jdbc/blob/FLINK-35363/.github/workflows/clean_space.yml#L47-L54


Best regards,


[DISCUSS] FLIP-287: Extend Sink#InitContext to expose ExecutionConfig and JobID

2023-01-13 Thread Joao Boto
Hi flink devs,

I'd like to start a discussion thread for FLIP-287[1].
This comes from an offline discussion with @Lijie Wang, from FLIP-239[2]
specially for the sink[3].

Basically to expose the ExecutionConfig and JobId on SinkV2#InitContext.
This  changes are necessary to correct migrate the current sinks to SinkV2
like JdbcSink, KafkaTableSink and so on, that relies on RuntimeContext

Comments are welcome!
Thanks,

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
[2]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271
[3] https://issues.apache.org/jira/browse/FLINK-25421


dev@flink.apache.org

2023-10-13 Thread Joao Boto
Hi all,
I am happy to announce that FLIP-239: Port JDBC Connector to
FLIP-27&FLIP-143[1] has been accepted.
There are 8 approving votes, 5 of which are binding:
- Yuepeng Pan (non-binding)
- Jing Ge (binding)
- Samrat Deb (non-binding)
- Martijn Visser (binding)
- Sergey Nuyanzin  (binding)
- Leonard Xu (binding)
- Qingsheng Ren (binding)
- Ahmed Hamdy (non-binding)

There are no disapproving votes.

Thanks everyone for participating!

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271

Best Regards,
João Boto


[jira] [Created] (FLINK-13428) StreamingFileSink allow part file name to be configurable

2019-07-25 Thread Joao Boto (JIRA)
Joao Boto created FLINK-13428:
-

 Summary: StreamingFileSink allow part file name to be configurable
 Key: FLINK-13428
 URL: https://issues.apache.org/jira/browse/FLINK-13428
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Reporter: Joao Boto


Allow that part file name could be configurable:
 * partPrefix can be passed
 * allow to add extension if writer define one

 

the part prefix allow to set a better name to file

the extension allow system like Athena or Presto to automatic detect the type 
of file and the compression if applied



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-8249) Kinesis Producer dont cpnfigure region

2017-12-13 Thread Joao Boto (JIRA)
Joao Boto created FLINK-8249:


 Summary: Kinesis Producer dont cpnfigure region
 Key: FLINK-8249
 URL: https://issues.apache.org/jira/browse/FLINK-8249
 Project: Flink
  Issue Type: Bug
  Components: Kinesis Connector
Affects Versions: 1.4.0
Reporter: Joao Boto


Hi,

setting this configurations to FlinkKinesisProducer:
properties.put(AWSConfigConstants.AWS_REGION, "eu-west-1");
properties.put(AWSConfigConstants.AWS_ACCESS_KEY_ID, "accessKey");
properties.put(AWSConfigConstants.AWS_SECRET_ACCESS_KEY, "secretKey");

is throwing this error:
{code}
17/12/13 10:50:11 ERROR LogInputStreamReader: [2017-12-13 10:50:11.290786] 
[0x57ba][0x7f31cbce5780] [error] [main.cc:266] Could not configure the 
region. It was not given in the config and we were unable to retrieve it from 
EC2 metadata.
17/12/13 10:50:12 ERROR KinesisProducer: Error in child process
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.IrrecoverableError:
 Child process exited with code 1
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:525)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:497)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.startChildProcess(Daemon.java:475)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.access$100(Daemon.java:63)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon$1.run(Daemon.java:133)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17/12/13 10:50:15 ERROR LogInputStreamReader: [2017-12-13 10:50:15.700441] 
[0x57c4][0x7ffb152b5780] [error] [AWS Log: ERROR](CurlHttpClient)Curl 
returned error code 28
17/12/13 10:50:15 ERROR LogInputStreamReader: [2017-12-13 10:50:15.700521] 
[0x57c4][0x7ffb152b5780] [error] [AWS Log: 
ERROR](EC2MetadataClient)Http request to Ec2MetadataService failed.
{code}

making some investigations the region is never setted and i think this is the 
reason:
in commit: 
https://github.com/apache/flink/commit/9ed5d9a180dcd871e33bf8982434e3afd90ed295#diff-f3c6c35f3b045df8408b310f8f8a6bc7
{code}
KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
-
-   
producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));
+   // check and pass the configuration properties
+   KinesisProducerConfiguration producerConfig = 
KinesisConfigUtil.validateProducerConfiguration(configProps);

producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
{code}

this line was removed
producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));

cc [~tzulitai]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8424) [Cassandra Connector] Update Cassandra version to last one

2018-01-12 Thread Joao Boto (JIRA)
Joao Boto created FLINK-8424:


 Summary: [Cassandra Connector] Update Cassandra version to last one
 Key: FLINK-8424
 URL: https://issues.apache.org/jira/browse/FLINK-8424
 Project: Flink
  Issue Type: Improvement
Reporter: Joao Boto
Priority: Critical


Cassandra connector are using a version release in the beginning of 2016

This is to upgrade the cassandra version to something new




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-13609) StreamingFileSink - reset part counter on bucket change

2019-08-07 Thread Joao Boto (JIRA)
Joao Boto created FLINK-13609:
-

 Summary: StreamingFileSink - reset part counter on bucket change
 Key: FLINK-13609
 URL: https://issues.apache.org/jira/browse/FLINK-13609
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Reporter: Joao Boto


When writing to files using StreamingFileSink on bucket change we expect that 
partcounter will reset its counter to 0

as a example
 * using DateTimeBucketAssigner using ({color:#6a8759}/MM/dd/HH{color}) 
 * and ten files hour (for simplicity)

this will create the:
 * bucket 2019/08/07/00 with files partfile-0-0 to partfile-0-9
 * bucket 2019/08/07/01 with files partfile-0-10 to partfile-0-19
 * bucket 2019/08/07/02 with files partfile-0-20 to partfile-0-29

and we expect this:
 * bucket 2019/08/07/00 with files partfile-0-0 to partfile-0-9
 * bucket 2019/08/07/01 with files partfile-0-0 to partfile-0-9
 * bucket 2019/08/07/02 with files partfile-0-0 to partfile-0-9

 

[~kkl0u] i don't know if it's the expected behavior  (or this can be configured)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13634) StreamingFileSink - allow bulkwriter to compress data

2019-08-07 Thread Joao Boto (JIRA)
Joao Boto created FLINK-13634:
-

 Summary: StreamingFileSink - allow bulkwriter to compress data
 Key: FLINK-13634
 URL: https://issues.apache.org/jira/browse/FLINK-13634
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Reporter: Joao Boto


I have developed a CompressFileWriter base on BulkWriter to compress data

but I dont know were to put this code.. inside filesystem or as flink-format 
module..

other question is that I used org.apache.commons.compress.compressors instead 
of hadoop compressor

[~kkl0u] could you guide me

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13860) Flink Apache Kudu Connector

2019-08-26 Thread Joao Boto (Jira)
Joao Boto created FLINK-13860:
-

 Summary: Flink Apache Kudu Connector
 Key: FLINK-13860
 URL: https://issues.apache.org/jira/browse/FLINK-13860
 Project: Flink
  Issue Type: New Feature
Reporter: Joao Boto


Hi..

I'm the contributor and maintainer of this connector on Bahir-Flink project

[https://github.com/apache/bahir-flink/tree/master/flink-connector-kudu]

 

but seems that flink-connectors on that project are less maintained an its 
difficult to maintain the code up to date, as PR are not merged and never 
released any version, which makes it difficult to use easily

 

I would like to contribute that code to flink allowing other to contribute and 
use that connector

 

[~fhueske] what do you think?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)