Query about autinference of numPartitions for `JdbcIO#readWithPartitions`

2024-05-31 Thread Vardhan Thigle via user
Hi Beam Experts,I have a small query about `JdbcIO#readWithPartitions`


ContextJdbcIO#readWithPartitions seems to always default to 200 partitions
(DEFAULT_NUM_PARTITIONS). This is set by default when the object is
constructed here

There seems to be no way to override this with a null value. Hence it seems
that, the code

that
checks the null value and tries to auto infer the number of partitions
based on the never runs.I am trying to use this for reading a tall table of
unknown size, and the pipeline always defaults to 200 if the value is not
set.  The default of 200 seems to fall short as worker goes out of memory
in reshuffle stage. Running with higher number of partitions like 4K helps
for my test setup.Since the size is not known at the time of implementing
the pipeline, the auto-inference might help setting maxPartitions to a
reasonable value as per the heuristic decided by Beam code.
Request for help

Could you please clarify a few doubts around this?

   1. Is this behavior intentional?
   2. Could you please explain the rationale behind the heuristic in L1398
   

and DEFAULT_NUM_PARTITIONS=200?


I have also raised this as issues/31467 incase it needs any changes in the
implementation.


Regards and Thanks,
Vardhan Thigle,
+919535346204


Re: Query about `JdbcIO.PoolableDataSourceProvider`

2024-05-04 Thread Vardhan Thigle via user
Regards and Thanks,
Vardhan Thigle,
+919535346204
A small correction, I intended to link to JdbcIO.html



On Sat, May 4, 2024 at 5:48 PM Vardhan Thigle 
wrote:

> Hi Beam Experts,
>
> I had a small query about `JdbcIO.PoolableDataSourceProvider`
>
> As per main the documentation of JdbcIO
> ,
> (IIUC) `JdbcIO.PoolableDataSourceProvider` creates one DataSource per
> execution thread by default which can overwhelm the source db.
>
> Where As
>
> As per the Java doc of
> 
> JdbcIO.PoolableDataSourceProvider,
> 
>
>
> At most a single DataSource instance will be constructed during pipeline
> execution for each unique JdbcIO.DataSourceConfiguration
> 
>  within
> the pipeline.
>
> If I want a singleton poolable connection for a given source database and
> my pipeline is dealing with multiple source databases, do I need to wrap
> the `JdbcIO.PoolableDataSourceProvider` in another concurrent hash map
> (from the implementation it looks lit that's what it does already and it's
> not needed)?I am a bit confused due to the variation in the 2 docs above
> (it's quite possible that I am interpreting them wrong)
> Would it be more recommended to rollout a custom class as suggested in the
> main documentation of JdbcIO
> ,
> in cases like:1. configure the poolconfig 2. Use an alternative source
> like say Hikari which If I understand correctly is not possible with
> JdbcIO.PoolableDataSourceProvider
> 
> .
>
>
>
>
> Regards and Thanks,
> Vardhan Thigle,
> +919535346204 <+91%2095353%2046204>
>


Query about `JdbcIO.PoolableDataSourceProvider`

2024-05-04 Thread Vardhan Thigle via user
Hi Beam Experts,

I had a small query about `JdbcIO.PoolableDataSourceProvider`

As per main the documentation of JdbcIO
,
(IIUC) `JdbcIO.PoolableDataSourceProvider` creates one DataSource per
execution thread by default which can overwhelm the source db.

Where As

As per the Java doc of

JdbcIO.PoolableDataSourceProvider,



At most a single DataSource instance will be constructed during pipeline
execution for each unique JdbcIO.DataSourceConfiguration

within
the pipeline.

If I want a singleton poolable connection for a given source database and
my pipeline is dealing with multiple source databases, do I need to wrap
the `JdbcIO.PoolableDataSourceProvider` in another concurrent hash map
(from the implementation it looks lit that's what it does already and it's
not needed)?I am a bit confused due to the variation in the 2 docs above
(it's quite possible that I am interpreting them wrong)
Would it be more recommended to rollout a custom class as suggested in the
main documentation of JdbcIO
,
in cases like:1. configure the poolconfig 2. Use an alternative source like
say Hikari which If I understand correctly is not possible with
JdbcIO.PoolableDataSourceProvider

.




Regards and Thanks,
Vardhan Thigle,
+919535346204


Re: Query about `JdbcIO`

2024-02-25 Thread Vardhan Thigle via user
Thanks for the pointer, xqhu@ !
Regards and Thanks,
Vardhan Thigle,
+919535346204


On Sun, Feb 25, 2024 at 2:19 AM XQ Hu  wrote:

> I did not find BEAM-13846 but this suggests String is never supported:
>
>
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcUtilTest.java#L59
>
> However, you could use the code from the test to create yours.
>
> On Thu, Feb 22, 2024 at 11:20 AM Vardhan Thigle via user <
> user@beam.apache.org> wrote:
>
>> Hi,
>> I had a small query about `JdbcIO`.
>> As per the documentation
>> <https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.html>
>>  `readWithPartitions` is supported for  Long, DateTime
>> <https://static.javadoc.io/joda-time/joda-time/2.10.10/org/joda/time/DateTime.html?is-external=true>
>> , String types for the partition column but on top of the tree code, 
>> `PRESET_HELPERS`
>> (ref
>> <https://github.com/apache/beam/blob/384c1034cd55fd0aa2a297581e113b9a4f6a4847/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java#L492>)
>> support only Long and DateTime.
>>
>> Was the support for `String` rolled back? If yes could you please help me
>> with the exact problem that caused the rollback (or any pointers to a
>> previous Issue)?
>>
>> Regards and Thanks,
>> Vardhan Thigle,
>> +919535346204 <+91%2095353%2046204>
>>
>> Regards and Thanks,
>> Vardhan Thigle,
>> +919535346204 <+91%2095353%2046204>
>>
>


Query about `JdbcIO`

2024-02-22 Thread Vardhan Thigle via user
Hi,
I had a small query about `JdbcIO`.
As per the documentation

 `readWithPartitions` is supported for  Long, DateTime

, String types for the partition column but on top of the tree code,
`PRESET_HELPERS`
(ref
)
support only Long and DateTime.

Was the support for `String` rolled back? If yes could you please help me
with the exact problem that caused the rollback (or any pointers to a
previous Issue)?

Regards and Thanks,
Vardhan Thigle,
+919535346204 <+91%2095353%2046204>

Regards and Thanks,
Vardhan Thigle,
+919535346204