I didn't understand what you mean for "it should also be possible to reuse the same connection of an InputFormat across InputSplits, i.e., calls of the open() method". At the moment in the open method there's a call to establishConnection, thus, a new connection is created for each split. If I understood correctly, you're suggesting to create a pool in the inputFormat and simply call poo.borrow() in the open() rather than establishConnection?
On 14 Apr 2016 17:28, "Chesnay Schepler" <ches...@apache.org> wrote: > On 14.04.2016 17:22, Fabian Hueske wrote: > >> Hi Flavio, >> >> that are good questions. >> >> 1) Replacing null values by default values and simply forwarding records >> is >> very dangerous, in my opinion. >> I see two alternatives: A) we use a data type that tolerates null values. >> This could be a POJO that the user has to provide or Row. The drawback of >> Row is that it is untyped and not easy to handle. B) We use Tuple and add >> an additional field that holds an Integer which serves as a bitset to mark >> null fields. This would be a pretty low level API though. I am leaning >> towards the user-provided POJO option. >> > i would also lean towards the POJO option. > >> >> 2) The JDBCInputFormat is located in a dedicated Maven module. I think we >> can add a dependency to that module. However, it should also be possible >> to >> reuse the same connection of an InputFormat across InputSplits, i.e., >> calls >> of the open() method. Wouldn't that be sufficient? >> > this is the right approach imo. > >> Best, Fabian >> >> 2016-04-14 16:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >> >> Hi guys, >>> >>> I'm integrating the comments of Chesnay to my PR but there's a couple of >>> thing that I'd like to discuss with the core developers. >>> >>> >>> 1. about the JDBC type mapping (addValue() method at [1]: At the >>> moment >>> if I find a null value for a Double, the getDouble of jdbc return >>> 0.0. >>> Is >>> it really the correct behaviour? Wouldn't be better to use a POJO or >>> the >>> Row of datatable that can handle void? Moreover, the mapping between >>> SQL >>> type and Java types varies much from the single JDBC implementation. >>> Wouldn't be better to rely on the Java type coming from using >>> resultSet.getObject() to get such a mapping rather than using the >>> ResultSetMetadata types? >>> 2. I'd like to handle connections very efficiently because we have a >>> use >>> case with billions of records and thus millions of splits and >>> establish >>> a >>> new connection each time could be expensive. Would it be a problem to >>> add >>> apache pool dependency to the jdbc batch connector in order to reuase >>> the >>> created connections? >>> >>> >>> [1] >>> >>> >>> https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java >>> >>> >